Four guardrail improvements to enforce agent capability boundaries:
1. Cascading Fallback (Fix 1+4):
- Rewrite searchKnowledge() in immigration-tools.service.ts with 3-tier fallback:
KB (similarity >= 0.55) → Web Search → Built-in Knowledge (clearly labeled)
- Rewrite executeTool() in policy-expert.service.ts to use retrieveKnowledge()
with confidence threshold; returns [KB_EMPTY]/[KB_LOW_CONFIDENCE]/[KB_ERROR]
markers so the model knows to label source reliability
2. Input Gate (Fix 2):
- New InputGateService using Haiku for lightweight pre-classification
- Classifications: ON_TOPIC / OFF_TOPIC (threshold >= 0.7) / HARMFUL (>= 0.6)
- Short messages (< 5 chars) fast-path to ON_TOPIC
- Gate failure is non-fatal (allows message through)
- Integrated in CoordinatorAgentService.sendMessage() before agent loop entry
- OFF_TOPIC/HARMFUL messages get fixed responses without entering agent loop
3. Output Gate Enhancement (Fix 3):
- Add TOPIC_BOUNDARY and NO_FABRICATION to EvaluationRuleType
- TOPIC_BOUNDARY: regex detection for code blocks, programming keywords,
AI identity exposure, off-topic indicators in agent responses
- NO_FABRICATION: detects policy claims without policy_expert invocation
or source markers; ensures factual claims are knowledge-backed
- Both rule types are admin-configurable (zero rules = zero checks)
- No DB migration needed (ruleType is varchar(50))
Files changed:
- NEW: agents/coordinator/input-gate.service.ts
- MOD: agents/coordinator/coordinator-agent.service.ts (inject InputGate + gate check)
- MOD: agents/agents.module.ts (register InputGateService)
- MOD: agents/coordinator/evaluation-gate.service.ts (2 new evaluators)
- MOD: domain/entities/evaluation-rule.entity.ts (2 new rule types)
- MOD: agents/specialists/policy-expert.service.ts (RAG confidence threshold)
- MOD: claude/tools/immigration-tools.service.ts (cascading fallback)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| admin-client | ||
| services | ||
| shared | ||
| web-client | ||