iconsulting

Commit Graph

Author	SHA1	Message	Date
hailin	73dee93d19	feat(docling): persist model cache via Docker volume - Add docling_models volume mounted at /models in container - Set HF_HOME=/models/huggingface at runtime (via docker-compose env) - Models download once → persist in volume → survive container rebuilds - Build-time preload uses \|\| to not block build if network unavailable Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 07:18:14 -08:00
hailin	764613bd86	fix(docling): use standalone script for model pre-download Inline Python one-liner had syntax errors (try/except/finally can't be single-line). Move to scripts/preload_models.py for reliable execution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 07:16:20 -08:00
hailin	d725864cd6	fix(docling): pre-download models during Docker build DocumentConverter() constructor only sets up config, models are lazily downloaded on first convert(). Fix by running an actual PDF conversion during build to trigger HuggingFace model download and cache. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 07:13:54 -08:00
hailin	57d21526a5	feat(knowledge): add Docling document parsing microservice Add IBM Docling as a Python FastAPI microservice for high-quality document parsing with table structure recognition (TableFormer ~94% accuracy) and OCR support, replacing pdf-parse/mammoth as the primary text extractor. Architecture: - New docling-service (Python FastAPI, port 3007) in Docker network - knowledge-service calls docling-service via HTTP POST multipart/form-data - Graceful fallback: if Docling fails, falls back to pdf-parse/mammoth - Text/Markdown files skip Docling (no benefit for plain text) Changes: - New: packages/services/docling-service/ (main.py, Dockerfile, requirements.txt) - docker-compose.yml: add docling-service, wire DOCLING_SERVICE_URL to knowledge-service, add missing FILE_SERVICE_URL to conversation-service - text-extraction.service.ts: inject ConfigService, add extractViaDocling() with automatic fallback to legacy extractors - .env.example: add FILE_SERVICE_PORT/URL and DOCLING_SERVICE_PORT/URL Inter-service communication map: conversation-service → file-service (FILE_SERVICE_URL, attachments) conversation-service → knowledge-service (KNOWLEDGE_SERVICE_URL, RAG) knowledge-service → docling-service (DOCLING_SERVICE_URL, document parsing) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 05:24:10 -08:00

4 Commits