iconsulting/packages
hailin 04dbc61131 feat(agents): add capability boundary guardrails — input gate, cascading fallback, output gate rules
Four guardrail improvements to enforce agent capability boundaries:

1. Cascading Fallback (Fix 1+4):
   - Rewrite searchKnowledge() in immigration-tools.service.ts with 3-tier fallback:
     KB (similarity >= 0.55) → Web Search → Built-in Knowledge (clearly labeled)
   - Rewrite executeTool() in policy-expert.service.ts to use retrieveKnowledge()
     with confidence threshold; returns [KB_EMPTY]/[KB_LOW_CONFIDENCE]/[KB_ERROR]
     markers so the model knows to label source reliability

2. Input Gate (Fix 2):
   - New InputGateService using Haiku for lightweight pre-classification
   - Classifications: ON_TOPIC / OFF_TOPIC (threshold >= 0.7) / HARMFUL (>= 0.6)
   - Short messages (< 5 chars) fast-path to ON_TOPIC
   - Gate failure is non-fatal (allows message through)
   - Integrated in CoordinatorAgentService.sendMessage() before agent loop entry
   - OFF_TOPIC/HARMFUL messages get fixed responses without entering agent loop

3. Output Gate Enhancement (Fix 3):
   - Add TOPIC_BOUNDARY and NO_FABRICATION to EvaluationRuleType
   - TOPIC_BOUNDARY: regex detection for code blocks, programming keywords,
     AI identity exposure, off-topic indicators in agent responses
   - NO_FABRICATION: detects policy claims without policy_expert invocation
     or source markers; ensures factual claims are knowledge-backed
   - Both rule types are admin-configurable (zero rules = zero checks)
   - No DB migration needed (ruleType is varchar(50))

Files changed:
- NEW: agents/coordinator/input-gate.service.ts
- MOD: agents/coordinator/coordinator-agent.service.ts (inject InputGate + gate check)
- MOD: agents/agents.module.ts (register InputGateService)
- MOD: agents/coordinator/evaluation-gate.service.ts (2 new evaluators)
- MOD: domain/entities/evaluation-rule.entity.ts (2 new rule types)
- MOD: agents/specialists/policy-expert.service.ts (RAG confidence threshold)
- MOD: claude/tools/immigration-tools.service.ts (cascading fallback)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 21:59:10 -08:00
..
admin-client feat(mcp): add MCP Server management — backend API + admin UI 2026-02-06 18:29:02 -08:00
services feat(agents): add capability boundary guardrails — input gate, cascading fallback, output gate rules 2026-02-06 21:59:10 -08:00
shared feat(agents): implement multi-agent collaboration architecture 2026-02-06 04:26:39 -08:00
web-client feat(agents): implement multi-agent collaboration architecture 2026-02-06 04:26:39 -08:00