iconsulting/packages/services/knowledge-service
hailin 57d21526a5 feat(knowledge): add Docling document parsing microservice
Add IBM Docling as a Python FastAPI microservice for high-quality document
parsing with table structure recognition (TableFormer ~94% accuracy) and
OCR support, replacing pdf-parse/mammoth as the primary text extractor.

Architecture:
- New docling-service (Python FastAPI, port 3007) in Docker network
- knowledge-service calls docling-service via HTTP POST multipart/form-data
- Graceful fallback: if Docling fails, falls back to pdf-parse/mammoth
- Text/Markdown files skip Docling (no benefit for plain text)

Changes:
- New: packages/services/docling-service/ (main.py, Dockerfile, requirements.txt)
- docker-compose.yml: add docling-service, wire DOCLING_SERVICE_URL to
  knowledge-service, add missing FILE_SERVICE_URL to conversation-service
- text-extraction.service.ts: inject ConfigService, add extractViaDocling()
  with automatic fallback to legacy extractors
- .env.example: add FILE_SERVICE_PORT/URL and DOCLING_SERVICE_PORT/URL

Inter-service communication map:
  conversation-service → file-service (FILE_SERVICE_URL, attachments)
  conversation-service → knowledge-service (KNOWLEDGE_SERVICE_URL, RAG)
  knowledge-service → docling-service (DOCLING_SERVICE_URL, document parsing)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 05:24:10 -08:00
..
src feat(knowledge): add Docling document parsing microservice 2026-02-07 05:24:10 -08:00
Dockerfile fix(docker): copy shared module AFTER npm install to prevent wipe 2026-01-26 03:41:16 -08:00
nest-cli.json Initial commit: iConsulting 香港移民咨询智能客服系统 2026-01-09 00:01:12 -08:00
package.json feat(knowledge): add file upload with text extraction for knowledge base 2026-02-06 22:58:19 -08:00
tsconfig.json Initial commit: iConsulting 香港移民咨询智能客服系统 2026-01-09 00:01:12 -08:00