Commit Graph

239 Commits

Author SHA1 Message Date
hailin 247ecd6c86 fix(llm-gateway): add NODE_TLS_REJECT_UNAUTHORIZED for proxy upstream
The Anthropic/OpenAI upstream proxy uses a self-signed certificate.
Disable TLS verification in the gateway container.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 23:22:26 -08:00
hailin 0114e9896d fix(admin-client): remove unused imports in llm-gateway components
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 23:13:40 -08:00
hailin cb8133585d chore: update pnpm lockfile for llm-gateway dependencies
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 22:54:46 -08:00
hailin 6476bd868f feat(llm-gateway): 新增对外 LLM API 代理服务 — 完整的监管注入、内容审查和管理后台
## 新增微服务: llm-gateway (端口 3008)

对外提供与 Anthropic/OpenAI 完全兼容的 API 接口,中间拦截实现:
- API Key 认证:由我们分配 Key 给外部用户,SHA-256 哈希存储
- System Prompt 注入:在请求转发前注入监管合规内容(支持 prepend/append)
- 内容审查过滤:对用户消息进行关键词/正则匹配,支持 block/warn/log 三种动作
- 用量记录:异步批量写入,跟踪 token 消耗和费用估算
- 审计日志:记录每次请求的来源 IP、过滤状态、注入状态等
- 速率限制:基于内存滑动窗口的 RPM 限制

### 技术选型
- Fastify (非 NestJS):纯代理场景无需 DI 容器,路由开销 ~2ms
- SSE 流式管道:零缓冲直通,支持 Anthropic streaming 和 OpenAI streaming
- 规则缓存:30 秒 TTL,避免每次请求查库

### API 端点
- POST /v1/messages — Anthropic Messages API 代理(流式+非流式)
- POST /v1/embeddings — OpenAI Embeddings API 代理
- POST /v1/chat/completions — OpenAI Chat Completions API 代理
- GET /health — 健康检查

## 数据库 (5 张新表)

- gateway_api_keys: 外部用户 API Key(权限、限速、预算、过期时间)
- gateway_injection_rules: 监管内容注入规则(位置、匹配模型、匹配 Key)
- gateway_content_rules: 内容审查规则(关键词/正则、block/warn/log)
- gateway_usage_logs: Token 用量记录(按 Key、模型、提供商统计)
- gateway_audit_logs: 请求审计日志(IP、过滤状态、注入状态)

## Admin 后端 (conversation-service)

4 个 NestJS 控制器,挂载在 /conversations/admin/gateway/ 下:
- AdminGatewayKeysController: Key 的 CRUD + toggle
- AdminGatewayInjectionRulesController: 注入规则 CRUD + toggle
- AdminGatewayContentRulesController: 内容审查规则 CRUD + toggle
- AdminGatewayDashboardController: 仪表盘汇总、用量查询、审计日志查询

5 个 ORM 实体文件对应 5 张数据库表。

## Admin 前端 (admin-client)

新增 features/llm-gateway 模块,Tabs 布局包含 5 个管理面板:
- API Key Tab: 创建/删除/启停 Key,创建时一次性显示完整 Key
- 注入规则 Tab: 配置监管内容(前置/追加到 system prompt)
- 内容审查 Tab: 配置关键词/正则过滤规则
- 用量统计 Tab: 查看 token 消耗、费用、响应时间
- 审计日志 Tab: 查看请求记录、过滤命中、注入状态

菜单项: GatewayOutlined + "LLM 网关",位于"系统总监"和"数据分析"之间。

## 基础设施

- docker-compose.yml: 新增 llm-gateway 服务定义
- kong.yml: 新增 /v1/messages、/v1/embeddings、/v1/chat/completions 路由
  - 超时设置 300 秒(LLM 长响应)
  - CORS 新增 X-Api-Key、anthropic-version、anthropic-beta 头
- init-db.sql: 新增 5 张 gateway 表的建表语句

## 架构说明

内部服务(conversation-service、knowledge-service、evolution-service)继续直连 API,
llm-gateway 仅服务外部用户。两者通过共享 PostgreSQL 数据库关联配置。

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 22:32:25 -08:00
hailin 021afd8677 fix(prompts): remove '无需雇主,自由就业' from QMAS overview template
User wants exact wording: "基本门槛条件12项满足6项即可申请" only.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 08:28:29 -08:00
hailin e6e62b4fc6 fix(prompts): add verbatim template for immigration overview responses
When users ask "香港有哪些移民途径" or similar overview questions,
the AI must use exact standard descriptions for each category.
QMAS: "基本门槛条件12项满足6项即可申请" — no mention of 综合计分.
Explicit  prohibition on adding extra scoring criteria in overview.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 08:23:25 -08:00
hailin 17fb542292 fix(prompts): remove '综合计分≥80' from QMAS overview — only '12项门槛满足6项'
The QMAS brief description in the category comparison table (Section 10.7)
should only state the 12-item threshold requirement, not the composite score.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 08:14:50 -08:00
hailin 385621bbea fix(prompts): 修正类别概览表和Tier示例中的QMAS/TTPS不准确描述
类别对比表(Section 10.7)和Tier 1示例中仍有旧描述:
- 优才:"综合计分≥80" → "12项门槛满足6项 + 综合计分≥80"
- 高才通:"年薪250万港币" → "全年收入250万港币"、"百强大学" → "百强大学学士学位"
- Tier示例"大专能申优才":补充12项门槛说明

AI在回答概览类问题时参考此表生成简介,不修正会导致仍输出旧版不准确信息。

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 08:11:36 -08:00
hailin 0ca3d3e922 fix(agents): 严格对标官方政策修正高才通(TTPS) A/B/C类准入条件 + 修复GEP/TTPS命名混淆
## 问题

对比官方政策原文与机器人回答,发现高才通A/B/C类条件描述存在多处不准确:
- A类:"年薪"应为"全年收入"(含底薪+奖金+津贴+股票期权等)
- B类:缺少"学士学位"限定,缺少"申请前五年内"时间窗口
- C类:缺少"学士学位"和"五年内"限定,完全缺失排除条款和"先到先得"分配方式
- 系统全局存在GEP/TTPS命名混淆(GEP=一般就业政策 vs TTPS=高才通)

## 修改内容(3层保障)

### Layer 1: Assessment Expert Prompt (assessment-expert-prompt.ts)
- 修正类别列表命名:GEP/TTPS → TTPS(高才通),TTPS/GEP → GEP(一般就业政策)
- A类:明确"全年收入"定义(底薪+奖金+津贴+期权变现),不限学校不限行业
- B类:要求"学士学位"(非泛指学位),限定"申请前五年内"累积三年经验
- C类:要求"申请前五年内"获颁"学士学位",新增排除条款(在港非本地毕业生不适用),
  注明先到先得分配方式,引导至IANG
- 新增评估输出要求:必须标注subClass (A/B/C/none)

### Layer 2: Coordinator System Prompt (coordinator-system-prompt.ts)
- DEFAULT_CATEGORIES命名修正:GEP↔TTPS互换为正确对应
- Section 10.2 完全重写:严格按官方政策的A/B/C条件表格
- 新增"关键细节"区块:5条回答必须准确的要点
- 新增常见问答:"硕士算不算?"、"收入包括哪些?"
- Section 10.4 命名修正:(GEP/TTPS) → (GEP)

### Layer 3: Code-level Post-validation (immigration-tools.service.ts)
- Step 4.6: TTPS后校验逻辑(安全网)
  - C类排除条款检测:扫描highlights/concerns中"在港"/"IANG"/"香港院校"关键词
  - subClass缺失警告:eligible=true但未指定A/B/C时自动添加concern
  - 兼容命名混淆期:同时检查 'TTPS' 和 'GEP' category

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 07:58:38 -08:00
hailin 60a74fc3b0 feat(agents): enforce QMAS 12-item eligibility threshold with 3-layer validation
根据2024年11月1日更新的优才计划政策,实现12项基本门槛评核准则的
系统级强制校验(需满足至少6项才具备申请资格)。

**Layer 1 — Assessment Expert Prompt (assessment-expert-prompt.ts):**
- QMAS评估新增强制性3步流程:门槛评核 → 成就计分制 → 综合计分制
- 12项评核准则逐一列出,含判定依据(年龄≤50、硕士/博士、STEM、
  双语能力、英文能力、≥5年工作经验、跨国/知名企业≥3年、
  特定行业≥3年、国际经验≥2年、年收入≥100万港币、
  业务实体盈利≥500万港币、上市公司)
- 每项判定为 met/not_met/unknown,unknown不计为符合
- 门槛不通过 → eligible=false, score上限29分
- 输出JSON新增 thresholdCheck 结构化字段(items数组+metCount+passed)

**Layer 2 — Code-level 后置校验 (immigration-tools.service.ts):**
- Step 4.5 安全网:解析评估结果后校验QMAS thresholdCheck一致性
- 门槛不通过但score>29 → 自动降级修正(score=29, eligible=false)
- 门槛通过但eligible=false且score>29 → 自动修正eligible=true
- 缺少thresholdCheck → 记录警告日志

**Layer 3 — Coordinator System Prompt (coordinator-system-prompt.ts):**
- Section 10.1 新增"基本门槛(2024年11月更新)"小节
- 明确说明门槛不通过者不具备申请资格(即使计分制达80分)
- 更新4条常见问题速答,融入门槛准则解释

**数据收集增强 (collection-expert-prompt.ts):**
- 新增3个附加字段:company_type、business_ownership、listed_company
- QMAS类别映射扩展,覆盖门槛评核12项所需全部数据点

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 07:40:47 -08:00
hailin 1f6d473649 feat(admin): add multimodal image paste support to all admin chat interfaces
支持管理员在3个管理聊天界面(系统总监、评估指令、收集指令)中通过
粘贴板粘贴图片,实现与管理Agent的多模态对话。

**新增文件:**
- `shared/hooks/useImagePaste.ts`: 共享 hook,处理剪贴板图片粘贴、
  base64 转换、待发送图片管理、多模态内容块构建

**后端改动 (conversation-service):**
- 3个管理聊天服务 (system-supervisor-chat, directive-chat,
  collection-directive-chat): chat() 方法参数类型从 `content: string`
  改为 `content: Anthropic.MessageParam['content']`,支持接收图片块
- 3个管理控制器 (admin-supervisor, admin-assessment-directive,
  admin-collection-directive): DTO content 类型改为 `any` 以透传
  前端发送的多模态内容

**前端改动 (admin-client):**
- 3个 API 类型文件: ChatMessage.content 类型扩展为
  `string | ContentBlock[]`
- SupervisorPage: 集成 useImagePaste hook,添加 onPaste 处理、
  待发送图片预览(64x64 缩略图+删除按钮)、消息中图片渲染
- DirectiveChatDrawer: 同上,48x48 缩略图适配 Drawer 宽度
- CollectionChatDrawer: 同上

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 21:18:57 -08:00
hailin 3b6e1586b7 fix(admin): add Markdown rendering to assessment & collection chat drawers
Both directive chat drawers were rendering AI responses as plain text.
Apply the same ReactMarkdown + remark-gfm treatment used in supervisor.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 21:03:22 -08:00
hailin 12e622040a fix(admin): add Markdown rendering to System Supervisor chat
Supervisor responses contain rich Markdown (tables, headers, bold,
lists, code). Previously rendered as plain text with pre-wrap.

- Install react-markdown + remark-gfm for GFM table support
- Wrap assistant messages in ReactMarkdown component
- Add .supervisor-markdown CSS styles (tables, headings, lists, hr, code)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 21:01:20 -08:00
hailin 5034ef4a70 feat(admin): add System Supervisor — global system status chat interface
Add a "系统总监" (System Supervisor) feature that provides admins with
a natural language chat interface to query the entire iConsulting system's
operational status, including all 7 specialist agents, directives, token
usage, conversation statistics, and system health.

Backend:
- SystemSupervisorChatService: Haiku 4.5 with 7 read-only tools
  - get_agent_configs: list all 7 agent model/parameter configs
  - get_agent_execution_stats: execution counts, success rates, latency
  - get_directives_summary: assessment + collection directive overview
  - get_token_usage_stats: token consumption and cost by model
  - get_conversation_stats: conversation counts, conversion rates, stages
  - get_evaluation_rules: quality gate rule configuration
  - get_system_health: circuit breakers, Redis, service availability
- AdminSupervisorController: POST /conversations/admin/supervisor/chat
- Registered in AgentsModule (provider + export) and ConversationModule
- Added AgentExecutionORM to TypeOrmModule.forFeature in AgentsModule

Frontend (admin-client):
- features/supervisor/ with Clean Architecture layers:
  - infrastructure/supervisor.api.ts: HTTP client
  - application/useSupervisor.ts: React Query mutation hook
  - presentation/pages/SupervisorPage.tsx: full-page chat UI
- Quick action buttons: 系统概况, Agent统计, 成本报告, 健康检查
- Route: /supervisor, menu icon: EyeOutlined (between 收集指令 and 数据分析)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 20:49:57 -08:00
hailin 8e4bd95dda feat(agents): add Collection Expert specialist + admin directive system
## Part A: Collection Expert Specialist Agent (7th specialist)
- New specialist: CollectionExpertService (Haiku 4.5, maxTurns 2)
  - Analyzes user info completeness against 12-item weighted checklist
  - Identifies missing fields, recommends next questions
  - Category-specific priority adjustments (QMAS/GEP/IANG/TTPS/CIES/TechTAS)
  - Tools: search_knowledge, get_user_context
  - Admin directive injection: loads active directives from DB before each run
- Prompt: collection-expert-prompt.ts (completeness calc, validation rules, JSON output)
- Coordinator integration: invoke_collection_expert tool + case in executeAgentTool
- System prompt: section 2.6 usage guide, section 4.3 optional invocation reference

## Part B: Admin Directive System (parallel to assessment directives)
- ORM: CollectionDirectiveORM (collection_directives table)
  - Types: general, priority, category, validation
  - Multi-tenant with tenant_id + enabled indexes
- SQL: CREATE TABLE collection_directives in init-db.sql
- Controller: /conversations/admin/collection-directives (10 REST endpoints)
  - CRUD + toggle + reset + preview + AI chat
- Chat Service: CollectionDirectiveChatService (Haiku 4.5 tool loop)
  - 5 tools: list/create/update/delete/reset directives
  - mutated flag for frontend cache invalidation

## Part C: Frontend Admin-Client
- Feature module: features/collection-config/ (5 files)
  - API client, React Query hooks, Config page, Chat drawer
  - Directive types: 通用指令/优先级调整/类别配置/验证规则
- Route: /collection-config in App.tsx
- Sidebar: FormOutlined icon, label '收集指令' in MainLayout.tsx

Files: 11 new, 9 modified | Backend + frontend compile clean

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 20:07:33 -08:00
hailin 22bca31690 feat(agents): add AI chat interface for directive management
Add a conversational chat drawer to the assessment config admin page,
allowing admins to manage assessment directives via natural language.

Backend:
- DirectiveChatService: Haiku 4.5 LLM with 5 tools (list, create,
  update, delete, reset) and iterative tool loop (max 5 turns)
- System prompt dynamically includes current directive state from DB
- POST /chat endpoint on admin-assessment-directive controller
- Registered in AgentsModule (global), injected via @Optional()

Frontend:
- DirectiveChatDrawer: Ant Design Drawer (480px) with message list,
  input box (Enter to send, Shift+Enter for newline), loading state
- useDirectiveChat hook: React Query mutation, auto-invalidates
  directive queries when response.mutated === true
- "AI 助手" button added to AssessmentConfigPage header

Files:
- NEW: agents/admin/directive-chat.service.ts (LLM tool-loop service)
- NEW: components/DirectiveChatDrawer.tsx (chat drawer UI)
- MOD: agents.module.ts (register + export DirectiveChatService)
- MOD: admin-assessment-directive.controller.ts (POST /chat endpoint)
- MOD: assessment-config.api.ts (chat API method + types)
- MOD: useAssessmentConfig.ts (useDirectiveChat hook)
- MOD: AssessmentConfigPage.tsx (AI button + drawer integration)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 19:17:07 -08:00
hailin 9feb03153b feat(agents): add admin assessment directive system for dynamic prompt injection
Admins can now write natural-language directives that get injected into the
assessment expert's system prompt. Directives are stored in DB, loaded per
execution, and support incremental additions, toggling, and full reset.

Backend:
- New assessment_directives table + ORM entity
- Admin CRUD API at /conversations/admin/assessment-directives
- buildAssessmentExpertPrompt() accepts optional adminDirectives param
- AssessmentExpertService loads active directives from DB before each execution
- Fail-safe: missing repo/tenant context → default prompt (no directives)

Frontend (admin-client):
- New "评估指令" page with table, create/edit modals, toggle switches
- Prompt preview panel showing assembled directive text
- Reset-to-default with confirmation
- React Query hooks for all CRUD operations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 18:21:21 -08:00
hailin b09538144c fix(security): make invoke_assessment_expert payment gate fail-closed
Previously the payment check on invoke_assessment_expert used
`&& this.paymentClient` which silently skipped the entire gate when
PaymentClientService was unavailable (DI failure / optional inject).
Now returns an explicit error when the payment service is unreachable,
preventing unpaid assessments from executing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 17:46:04 -08:00
hailin f06ed09c3c feat(agents): v2.0 interruptible assessment with abort chain + admin toggle
Full abort signal chain: Gateway → ConversationService → Coordinator →
ToolExecutor → BaseSpecialist → Claude API stream. Admin can toggle between
v1 (post-completion re-evaluation) and v2 (interruptible) via REST API.

Changes:
- Gateway: add cancel_stream WebSocket handler + active stream tracking
- Gateway: abort active stream on client disconnect
- ConversationService: accept + forward AbortSignal
- CoordinatorAgentService: link external AbortSignal to internal controller,
  thread through tool executor, read assessment mode from Redis feature flag
- BaseSpecialistService: hard abort (throw) instead of soft break,
  add abort signal to Promise.race in callClaude(), abort stream on cancel
- ImmigrationToolsService: thread abortSignal to assessment expert
- AdminObservabilityController: GET/PUT feature-flags/assessment-mode
  (Redis-backed, defaults to v1)

v1 and v2 coexist — admin controls which mode is active.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 17:39:57 -08:00
hailin aa9f31ff20 feat(agents): v1.0 post-completion re-evaluation with forceReassess parameter
When users correct or update personal info after assessment completion,
Coordinator can now re-run run_professional_assessment with forceReassess: true
to bypass the 30-day dedup and produce an updated report.

Changes:
- Add forceReassess boolean param to run_professional_assessment tool definition
- Skip already_assessed check when forceReassess=true in handler
- Add prompt rules for identifying info corrections and triggering re-evaluation
- Document the re-evaluation flow in sections 3.5 and 4.4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 17:30:24 -08:00
hailin a72e718510 fix(agents): add payment gate on invoke_assessment_expert + progress streaming for assessment
Two hardening fixes for the professional assessment pipeline:
1. Code-level payment verification before dispatching invoke_assessment_expert
   (prevents bypassing the prompt-only gate)
2. Thread onProgress callback through direct tool chain so run_professional_assessment
   streams agent_progress events during the 30-45s assessment expert execution

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 17:16:28 -08:00
hailin e809740fdb feat(agents): add run_professional_assessment tool with payment gate + artifact persistence
Replaces ad-hoc assessment flow with structured pipeline:
- Code-level payment verification (checks PAID ASSESSMENT order)
- Info completeness validation (age, nationality, education, work exp)
- Assessment expert invocation with result parsing
- Automatic persistence as UserArtifact (assessment_report type)
- 30-day dedup (existing report within 30 days returns cached)
- Frontend rendering for all status codes (completed, payment_required,
  info_incomplete, already_assessed, error)
- System prompt updated to mandate new tool for paid assessments
- Post-assessment auto-generation of checklist + timeline

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 17:01:56 -08:00
hailin 95f36752c9 feat(agents): add prompt-driven execution tools with DB persistence
Add 4 new tools (generate_document, manage_checklist, create_timeline,
query_user_artifacts) enabling the agent to create and manage persistent
user artifacts. Artifacts are saved to PostgreSQL and support dedup by
title, update-in-place, and cross-session querying. Frontend renders
rich UI cards for each artifact type.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 07:35:08 -08:00
hailin 85c78b0775 feat(admin): add system observability dashboard with circuit breaker monitoring
Backend: expose circuit breaker status via new AdminObservabilityController
(health, circuit-breakers, redis endpoints). Frontend: new observability
feature in admin-client with auto-refreshing status cards.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 05:28:24 -08:00
hailin 0d488ac68b feat(agents): add Redis checkpoint for agent loop crash recovery
- New RedisClientService: optional ioredis wrapper, gracefully degrades without REDIS_URL
- New RedisModule: global NestJS module providing Redis connectivity
- AgentCheckpoint interface: captures turn, messages, cost, agents, timestamp
- Agent loop saves checkpoint after each tool execution batch (TTL=10min)
- On restart with same conversationId+requestId, loads checkpoint and resumes from saved state
- Checkpoint auto-deleted after load to prevent stale recovery
- Coordinator injects @Optional() RedisClientService, builds save/load callbacks
- Zero impact when Redis is not configured — checkpoint silently skipped

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 04:27:25 -08:00
hailin 1a1573dda3 feat(resilience): add circuit breaker for downstream services
- New CircuitBreaker class: CLOSED → OPEN → HALF_OPEN three-state model
- Zero external dependencies, ~90 lines, fail-open semantics
- KnowledgeClientService: threshold=5, cooldown=60s, protects all 9 endpoints
- PaymentClientService: threshold=3, cooldown=30s, protects all 7 endpoints
- Both services refactored to use protectedFetch() — cleaner code, fewer try-catch
- Replaces verbose per-method error handling with centralized circuit breaker
- When tripped: returns null/empty fallback instantly, no network call

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 04:21:30 -08:00
hailin 2ebc8e6da6 feat(agents): stream specialist agent progress to frontend
- Convert BaseSpecialistService.callClaude() from sync .create() to streaming .stream()
- Add onProgress callback to SpecialistExecutionOptions for real-time text delta reporting
- All 6 specialist convenience methods now accept optional options parameter
- Coordinator creates throttled progress callback (every 300 chars) pushing agent_progress events
- Agent loop drains accumulated progress events after each tool execution batch
- WebSocket gateway forwards agent_progress events to frontend
- Progress event sink shared between tool executor and agent loop via closure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 04:17:37 -08:00
hailin 198ff4b349 feat(agents): add PreToolUse/PostToolUse hook system for tool call interception
- New ToolHooksService with dynamic hook registration (pre/post)
- Built-in audit logging: tool name, type, user, duration, success/failure
- Fail-open design: individual hook failures don't block tool execution
- Integrated into coordinator's createToolExecutor with full context
- Hook context includes: toolName, toolType (agent/direct/mcp), traceId, timing
- Supports future extensions: rate limiting, permission checks, analytics

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 04:09:14 -08:00
hailin 02ee4311dc feat(observability): add trace ID propagation across agent pipeline
- Extend TenantStore with traceId + traceStartTime in AsyncLocalStorage
- Generate traceId (UUID-12) at WebSocket gateway entry point
- Propagate traceId through AgentLoopParams → agentLoop → specialists
- Add [trace:xxx] prefix to all logger calls in agent-loop, coordinator, and specialists
- Replace console.log with NestJS Logger in ConversationGateway
- Include traceId in stream_start event for frontend correlation
- Add traceId to AgentExecutionRecord and BaseStreamEvent

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 04:05:24 -08:00
hailin ea6dff3a4e feat(agents): add model identity protection — prompt rules + code-level output filter
Prevent AI from revealing underlying model name (Claude/GPT/etc.) under any
circumstance including jailbreak attempts. Two defense layers:
- Prompt: anti-jailbreak rules + "小艾引擎" branding in coordinator system prompt
- Code: sanitizeModelLeaks() regex filter in agent-loop.ts streaming output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 13:02:22 -08:00
hailin 2a8a15fcb6 fix: resolve ClaudeModule DI crash + historical QR code display bug
1. ClaudeModule missing ConversationORM in TypeOrmModule.forFeature —
   ImmigrationToolsService now depends on ConversationORMRepository
   (added in query_user_profile), but ClaudeModule only had TokenUsageORM.
   Fix: add ConversationORM to ClaudeModule's TypeORM imports.

2. Historical messages show "支付创建失败" for payment QR codes —
   toolCall.result is stored as JSON string in DB metadata JSONB.
   Live streaming (useChat.ts) parses it correctly, but REST API
   load (chatStore.ts → MessageBubble.tsx) does not.
   Fix: normalize toolCall.result in ToolCallResult component —
   JSON.parse if string, pass through if already object.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 12:24:15 -08:00
hailin 43d4102e1f feat(agents): add query_user_profile tool for user info lookup
新增 query_user_profile 工具,让 AI agent 能回答用户关于自身信息的查询,
例如"这是我第几次咨询?"、"你记得我的信息吗?"、"我之前咨询过什么?"

## 问题背景
当用户问"这是我第几次跟你咨询?"时,AI 无法回答,因为没有任何工具
能查询用户的历史咨询数据。

## 实现方案:双层设计

### 第一层:被动上下文注入(Context Injector)
- context-injector.service.ts 注入 ConversationORM repo + TenantContextService
- buildConversationStatsBlock() 现在自动查询用户累计咨询次数
- 每次对话自动注入 `用户累计咨询次数: N 次(含本次对话)`
- 简单问题("这是第几次?")AI 可直接从上下文回答,零工具调用

### 第二层:主动工具调用(query_user_profile)
用户需要详细信息时,AI 调用此工具,返回完整档案:
- 咨询统计:累计次数、首次/最近咨询时间、类别分布
- 最近对话:最近 10 个对话的标题、类别、阶段
- 用户画像:系统记忆中的事实(学历/年龄/职业)、偏好、意图
- 订单统计:总单数、已支付、待支付

## 修改文件
- agents.module.ts: 添加 ConversationORM 到 TypeORM imports
- coordinator-tools.ts: 新增 query_user_profile 工具定义(只读)
- immigration-tools.service.ts: 注入 ConversationORM repo + TenantContextService,
  实现 queryUserProfile() 方法(并行查询对话+记忆+订单)
- coordinator-system-prompt.ts: 第3.3节添加工具文档和使用指引
- context-injector.service.ts: 注入 repo,conversation_stats 块添加累计咨询次数

## 依赖关系
- 无循环依赖:直接使用 TypeORM Repository<ConversationORM>(数据访问层),
  不依赖 ConversationService(避免 AgentsModule ↔ ConversationModule 循环)
- TenantContextService 全局可用,确保多租户隔离

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 12:17:23 -08:00
hailin 389f975e33 fix(payment): return paymentUrl from adapters, strip base64 from tool output
Alipay/WeChat adapters now return the source payment URL alongside the
QR base64. The generate_payment tool only returns paymentUrl (short text)
to Claude API — base64 qrCodeUrl is stripped to prevent AI from dumping
raw data:image into text responses. Frontend QRCodeSVG renders from
paymentUrl instead of base64.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 11:44:54 -08:00
hailin 6609e50100 fix(payment): add /api/v1 prefix to PaymentClientService URLs
payment-service uses setGlobalPrefix('api/v1'), so all routes are
under /api/v1/orders, /api/v1/payments, etc. PaymentClientService
was calling /orders directly, resulting in 404:

  Cannot POST /orders → 创建订单失败

Fixed all 7 endpoint URLs to include the /api/v1 prefix.
Same pattern as file-service fix (FILE_SERVICE_URL).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 11:26:26 -08:00
hailin 6767215f83 refactor(agents): remove Structured Output (Layer 2) to enable true streaming
背景:
在 commit bb1a113 中引入了 4 层回复质量控制体系:
- Layer 1: System Prompt (1095行详细指导)
- Layer 2: Structured Output (Zod schema → output_config)
- Layer 3: LLM-as-Judge (Haiku 4.5 评分)
- Layer 4: Per-intent hard truncation (已在 db8617d 移除)

Layer 2 (Structured Output) 的问题:
1. 阻塞流式输出 — output_config 强制模型输出 JSON,JSON 片段无法展示给
   用户,导致整个响应缓冲后才一次性输出
2. Zod 验证频繁崩溃 — intent 枚举值不匹配时 SDK 抛错,已出现 4 次 hotfix
   (b55cd4b, db8617d, 7af8c4d, 及本次)
3. followUp 字段导致内容丢失 — 模型将回答内容分到 followUp 后被过滤
4. intent 分类仅用于日志,对用户体验无价值
5. z.string() 无 .max() 约束 — 实际不控制回答长度

移除后,回答质量由以下机制保证(全部保留):
- Layer 1: System Prompt — 意图分类表、回答风格、长度指导
- Layer 3: LLM-Judge — 相关性/简洁性/噪音评分,不合格则自动重试
- API max_tokens: 2048 — 硬限制输出上限

改动:
- coordinator-agent.service.ts: 移除 zodOutputFormat/CoordinatorResponseSchema
  import 和 outputConfig 参数
- agent-loop.ts: 移除 text_delta 中的 outputConfig 守卫(文本现在直接流式
  输出)、移除 output_config API 参数、移除两个 Structured Output 验证失败
  恢复 catch 块、移除 JSON 解析 + safety net 块
- agent.types.ts: 从 AgentLoopParams 接口移除 outputConfig 字段
- coordinator-response.schema.ts: 清空 Zod schema/工具函数,保留历史备注

效果:
- 用户现在能看到逐字流式输出(token-by-token streaming)
- 消除了 Structured Output 相关的所有崩溃风险
- 代码净减 ~130 行

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 11:15:48 -08:00
hailin 913a3fd375 fix(prompt): defer 99元 mention — never in first response to new users
## Problem
User asks "我适合哪种移民方式?" as their first message → AI immediately
mentions 99元 paid assessment. This is aggressive and off-putting for new
users who are just exploring.

## Root Cause
Intent table classified "适合什么" as assessment_request with instruction
to immediately mention 99元. This conflicts with the conversion philosophy
section that says "免费问答建立信任 → 付费评估".

## Fix (3 changes in coordinator-system-prompt.ts)

1. **Intent table**: assessment_request no longer says "immediately mention
   99元". Instead references new handling rules below the table.

2. **New "评估请求处理规则" section** (after intent table):
   - Early conversation + no user info → exploratory question, NOT
     assessment request. Collect info first, give initial direction.
   - User shared info + explicitly asks "做个评估" → real assessment
     request, mention 99元.
   - User shared info but didn't ask → give free initial direction,
     don't proactively mention payment.

3. **Assessment suggestion timing** (section 5.6):
   - Added 3 prerequisites before mentioning 99元:
     a. At least 3 key info items collected
     b. Already gave free initial direction (user felt value)
     c. Conversation has gone 3-4+ rounds
   - Added absolute prohibition: never mention 99元 in first response.

4. **Conversion boundary example**: Changed misleading "我适合走高才通吗
   → 需要评估" to nuanced guidance that distinguishes exploration from
   genuine assessment requests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 10:52:37 -08:00
hailin 7af8c4d8de fix(agents): graceful recovery from structured output validation errors
## Problem
SDK's Zod validation for `output_config` occasionally fails with:
  "Failed to parse structured output: invalid_value at path [intent]"
This crashes the entire response — user sees nothing despite model
generating a valid answer.

## Root Cause
The Anthropic SDK validates streamed structured output against the Zod
schema (CoordinatorResponseSchema) after streaming completes. When the
model returns an intent value not in the z.enum() (rare but happens),
the SDK throws during stream iteration or finalMessage().

## Fix
1. Catch "Failed to parse structured output" errors in both:
   - Stream iteration catch block (for-await loop)
   - stream.finalMessage() catch block
2. Recover by extracting accumulated text from assistantBlocks
3. Manual JSON.parse (skips Zod validation — intent enum mismatch
   doesn't affect user-facing content)
4. Yield parsed.answer + parsed.followUp normally

## Also Included (from previous commit)
- Removed INTENT_MAX_ANSWER_LENGTH hard truncation (弊大于利)
- Only 2000-char safety net remains for extreme edge cases
- followUp: non-question content always appended (prevents content loss)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 10:42:52 -08:00
hailin db8617dda8 refactor(agents): remove per-intent hard truncation, keep 2000-char safety net
Hard-coded INTENT_MAX_ANSWER_LENGTH limits caused mid-sentence truncation and
content loss. Length control now relies on prompt + schema description + LLM-Judge
(3 layers). Only a 2000-char safety net remains for extreme edge cases.

Also simplified followUp: non-question followUp is now always appended (prevents
model content split from silently dropping text).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 10:32:53 -08:00
hailin b55cd4bc1e fix(agents): widen answer length limits and preserve followUp continuations
INTENT_MAX_ANSWER_LENGTH was too tight (objection_expression 200 chars truncated
good responses). Bumped all limits ~25-50%. Also fixed followUp filter that silently
dropped content when model split answer across answer+followUp fields — now appends
followUp as continuation when answer ends mid-sentence.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 10:25:59 -08:00
hailin 366a9cda3a feat(agents): tiered tool-calling system + KB coverage hint for smart routing
P0: Enrich Chapter 10 with detailed policy facts (QMAS scoring, GEP A/B/C
conditions, FAQ quick answers) so Claude can answer common questions directly
without tool calls. Replace absolute rule "never answer from memory" with
3-tier system: Tier 1 (direct from Ch10), Tier 2 (search_knowledge), Tier 3
(invoke_policy_expert).

P1: Context injector now always returns a kb_coverage_hint block — when KB has
results it tells Claude to prefer KB over web_search; when KB has no results
it suggests considering web_search. Web_search tool description updated to
reference the hint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 10:06:29 -08:00
hailin b7e84ba3b6 fix(agents): only run InputGate on first message to prevent mid-conversation misclassification
Short follow-up answers like "计算机,信息技术" were being classified as
OFF_TOPIC (0.85) because the InputGate has no conversation context. Now the
gate only runs when there are no previous messages (first message in conversation).
Mid-conversation topic management is handled by the Coordinator prompt.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 09:43:33 -08:00
hailin f5820f9c7f feat(agents): add subtle conversion guidance to coordinator prompt
Add section 5.6 "隐性转化引导" with trust-first conversion philosophy:
- Free facts vs paid analysis boundary
- "Taste-then-sell" strategy with positive but vague hints
- Assessment suggestion limited to max once per conversation
- Natural urgency only when fact-supported
- Post-assessment → full service transition only when user asks
- Anti-annoyance red line: never make user feel pushed to pay

Recalibrate info exchange (4.3): warm acknowledgment without deep analysis.
Add value framing (4.4) and post-assessment guidance (4.5).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 09:33:52 -08:00
hailin fb966244bc fix(agents): enforce 99 RMB assessment fee — remove "free assessment" language
Update coordinator system prompt to enforce pricing rules:
- All assessments cost 99 RMB (one-time per user), no free assessments
- Must collect payment before calling assessment expert
- Add fee inquiry intent type to response strategy table
- Update generate_payment tool description with fixed pricing
- Replace "免费初步咨询" with tiered service model

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 09:22:37 -08:00
hailin 636b10b733 feat(web): auto-scroll on all state changes + completed agent badges auto-fade
Fix auto-scroll by adding missing dependencies (currentConversationId, isStreaming,
completedAgents). Completed agent badges now show for 2.5s then smoothly fade out
instead of accumulating, keeping the status area clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 09:12:32 -08:00
hailin af8aea6b03 feat(web): move agent status inline with typing indicator for better UX
Instead of showing agent status in a separate panel below the chat,
display it inline beneath the typing dots ("...") in the message flow.
The dots remain the primary waiting indicator; agent status appears
below as supplementary context during specialist agent invocations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 09:05:41 -08:00
hailin 40a0513b05 fix(agents): strip markdown code fences from InputGate Haiku response
Haiku sometimes returns JSON wrapped in ```json ... ``` code blocks,
causing JSON.parse to fail. Strip markdown fences before parsing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 08:59:19 -08:00
hailin fa835e4f56 fix(agents): preserve image content blocks in context injection — fixes 209K token overflow
injectIntoMessages() was JSON.stringify-ing array content (with image blocks),
turning base64 data into text tokens (~170K) instead of image tokens (~1,600).
Fix: append context as a new text block in the array, preserving image block format.

Also fixes token estimation to count images at ~1,600 tokens instead of base64 char length,
and adds debug logging for API call token composition.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 08:51:24 -08:00
hailin 7dc364d9b3 fix(agents): raise compaction threshold to 160K (80% of 200K limit)
80K was too aggressive and caused premature context loss. Now triggers
at 160K tokens with a target of 80K after compaction.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 08:40:18 -08:00
hailin 033e476a6f fix(agents): wire up autoCompactIfNeeded to prevent token overflow
The auto-compaction logic (threshold 80K tokens, summarize older
messages via Haiku) existed but was never called in sendMessage flow.
Now called after context injection, before agent loop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 08:29:36 -08:00
hailin f2903bd67a fix(agents): use text placeholders for historical attachments to avoid token overflow
Historical images/PDFs were being re-downloaded and base64-encoded for
every API call, causing 200K+ token requests. Now only the current
message includes full attachment blocks; historical ones use text
placeholders like "[用户上传了图片: photo.png]".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 08:22:02 -08:00