iconsulting/packages/services/knowledge-service/src
hailin 57d21526a5 feat(knowledge): add Docling document parsing microservice
Add IBM Docling as a Python FastAPI microservice for high-quality document
parsing with table structure recognition (TableFormer ~94% accuracy) and
OCR support, replacing pdf-parse/mammoth as the primary text extractor.

Architecture:
- New docling-service (Python FastAPI, port 3007) in Docker network
- knowledge-service calls docling-service via HTTP POST multipart/form-data
- Graceful fallback: if Docling fails, falls back to pdf-parse/mammoth
- Text/Markdown files skip Docling (no benefit for plain text)

Changes:
- New: packages/services/docling-service/ (main.py, Dockerfile, requirements.txt)
- docker-compose.yml: add docling-service, wire DOCLING_SERVICE_URL to
  knowledge-service, add missing FILE_SERVICE_URL to conversation-service
- text-extraction.service.ts: inject ConfigService, add extractViaDocling()
  with automatic fallback to legacy extractors
- .env.example: add FILE_SERVICE_PORT/URL and DOCLING_SERVICE_PORT/URL

Inter-service communication map:
  conversation-service → file-service (FILE_SERVICE_URL, attachments)
  conversation-service → knowledge-service (KNOWLEDGE_SERVICE_URL, RAG)
  knowledge-service → docling-service (DOCLING_SERVICE_URL, document parsing)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 05:24:10 -08:00
..
adapters refactor(knowledge): separate file upload into independent entry point 2026-02-06 23:29:37 -08:00
application feat(knowledge): add Docling document parsing microservice 2026-02-07 05:24:10 -08:00
domain Initial commit: iConsulting 香港移民咨询智能客服系统 2026-01-09 00:01:12 -08:00
health fix(docker): add health check endpoints and fix IPv6 issue 2026-01-10 02:13:42 -08:00
infrastructure feat: add enterprise multi-tenancy infrastructure 2026-01-25 18:11:12 -08:00
knowledge feat(knowledge): add file upload with text extraction for knowledge base 2026-02-06 22:58:19 -08:00
memory refactor(services): implement 4-layer Clean Architecture for all backend services 2026-01-24 22:18:22 -08:00
migrations fix(db): add multi-tenancy migration for knowledge-service tables 2026-02-06 09:23:13 -08:00
app.module.ts fix(tenant): use TenantContextModule.forRoot() for global tenant context 2026-01-26 04:31:56 -08:00
main.ts fix(health): exclude /health endpoint from API prefix 2026-01-10 02:30:24 -08:00