iconsulting/packages/services
hailin 57d21526a5 feat(knowledge): add Docling document parsing microservice
Add IBM Docling as a Python FastAPI microservice for high-quality document
parsing with table structure recognition (TableFormer ~94% accuracy) and
OCR support, replacing pdf-parse/mammoth as the primary text extractor.

Architecture:
- New docling-service (Python FastAPI, port 3007) in Docker network
- knowledge-service calls docling-service via HTTP POST multipart/form-data
- Graceful fallback: if Docling fails, falls back to pdf-parse/mammoth
- Text/Markdown files skip Docling (no benefit for plain text)

Changes:
- New: packages/services/docling-service/ (main.py, Dockerfile, requirements.txt)
- docker-compose.yml: add docling-service, wire DOCLING_SERVICE_URL to
  knowledge-service, add missing FILE_SERVICE_URL to conversation-service
- text-extraction.service.ts: inject ConfigService, add extractViaDocling()
  with automatic fallback to legacy extractors
- .env.example: add FILE_SERVICE_PORT/URL and DOCLING_SERVICE_PORT/URL

Inter-service communication map:
  conversation-service → file-service (FILE_SERVICE_URL, attachments)
  conversation-service → knowledge-service (KNOWLEDGE_SERVICE_URL, RAG)
  knowledge-service → docling-service (DOCLING_SERVICE_URL, document parsing)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 05:24:10 -08:00
..
conversation-service fix(agents): correct Claude API file size limits (image 5MB, PDF 25MB) 2026-02-07 04:55:54 -08:00
docling-service feat(knowledge): add Docling document parsing microservice 2026-02-07 05:24:10 -08:00
evolution-service fix(analytics): handle statDate as string from database 2026-01-26 08:40:09 -08:00
file-service fix(files): replace MinIO presigned URLs with API proxy + base64 for Claude 2026-02-07 04:49:39 -08:00
knowledge-service feat(knowledge): add Docling document parsing microservice 2026-02-07 05:24:10 -08:00
payment-service fix(tenant): use TenantContextModule.forRoot() for global tenant context 2026-01-26 04:31:56 -08:00
user-service fix(nginx): fix admin location try_files path and add multi-tenancy migrations 2026-01-26 07:46:52 -08:00