- Flutter: language='auto' omits the language field → backend receives none
- Backend: no language field → passes undefined to STT service
- STT service: language=undefined → omits language param from Whisper request
- Whisper auto-detects language per utterance when no hint is provided
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Node 18 native fetch (undici) ignores https.Agent, causing fetch failed
on the self-signed proxy at 67.223.119.33:8443. Switch to https.request
with rejectUnauthorized: false which works reliably.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OPENAI_BASE_URL=https://67.223.119.33:8443/v1 already includes /v1,
so the URL was being built as .../v1/v1/audio/transcriptions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
multer was only transitively available; pnpm strict mode blocks it.
Also adds @types/multer for TypeScript compilation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add POST /api/v1/agent/transcribe endpoint (STT only, no agent trigger)
- Add transcribeAudio() to chat datasource and provider
- VoiceMicButton now fills the text input field with transcript;
user reviews and sends manually
- Add OPENAI_API_KEY/OPENAI_BASE_URL to agent-service in docker-compose
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New endpoint: POST /api/v1/agent/sessions/:sessionId/voice-message
- Accepts multipart/form-data audio file (any format Whisper supports)
- Transcribes via OpenAI Whisper API (routed through existing proxy)
- If a task is currently running in the session → hard-interrupts it first
(same cancel+inject pattern as text inject, triggered by voice command)
- Otherwise → starts a fresh task with the transcript
- Returns { sessionId, taskId, transcript } so client can subscribe to WS stream
This enables WhatsApp-style push-to-talk and doubles as an async voice
interrupt into any active agent workflow, bypassing the need for speaker
diarization (whoever presses record owns the message).
New files:
infrastructure/stt/openai-stt.service.ts — OpenAI Whisper client,
manually builds multipart/form-data, supports self-signed proxy cert
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements a two-level abort controller design to support real-time
interruption when the user speaks while the agent is still responding:
sessionAbortController (session-scoped)
- Created once when startSession() is called
- Fired only by terminateSession() (user hangs up)
- Propagated into each turn via addEventListener
turnAbort (per-turn, stored as handle.currentTurnAbort)
- Created fresh at the start of each executeTurn() call
- Stored on the VoiceSessionHandle so injectMessage() can abort it
- When a new inject arrives while a turn is running, injectMessage()
calls turnAbort.abort() BEFORE enqueuing the new message
Interruption flow:
1. User speaks mid-response → LiveKit stops TTS playback (client-side)
2. STT utterance → POST voice/inject → injectMessage() fires
3. handle.currentTurnAbort.abort() called → sets aborted flag
4. for-await loop checks turnAbort.signal.aborted on next SDK event → break
5. catch block NOT reached (break ≠ exception) → no error event emitted
6. finally block saves partial text with "[中断]" suffix to history
7. New message dequeued → fresh executeTurn() starts immediately
Why no "Agent error" message plays to the user:
- break exits the for-await loop silently, not via exception
- The catch block's error-event emission is guarded by err?.name !== 'AbortError'
AND requires an actual exception; a plain break never enters catch
- Empty or partial responses are filtered by `if response:` in agent.py
Also update module-level JSDoc with full architecture explanation covering
the long-lived run loop design, two-level abort hierarchy, tenant context
injection pattern, and SDK session resume across turns.
Update agent.py module docstring to document voice session lifecycle and
interruption flow for future maintainers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the per-turn POST /tasks approach for voice calls with a
long-lived agent run loop tied to the call lifecycle:
agent-service:
- Add AsyncQueue<T> utility for blocking message relay
- Add VoiceSessionManager: spawns one background run loop per voice call,
accepts injected messages, terminates cleanly on hangup
- Add VoiceSessionController with 3 endpoints:
POST /api/v1/agent/sessions/voice/start (call start)
POST /api/v1/agent/sessions/:id/voice/inject (each speech turn)
DELETE /api/v1/agent/sessions/:id/voice (user hung up)
- Register VoiceSessionManager + VoiceSessionController in agent.module.ts
voice-agent:
- AgentServiceLLM: add start_voice_session(), terminate_voice_session(),
inject_text_message() (voice/inject-aware), _do_inject_voice()
- AgentServiceLLMStream._run(): use voice/inject path when voice session
is active; fall back to per-task POST for text-chat / non-SDK engines
- entrypoint(): call start_voice_session() after session.start();
register _on_room_disconnect that calls terminate_voice_session()
so the agent is always killed when the user hangs up
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two issues fixed:
1. agent.controller.ts — on the FIRST task of each session, write title+voiceMode
into session.metadata so the client can display a meaningful conversation title:
- Text sessions: metadata.title = first 40 chars of user prompt
- Voice sessions: metadata.title = '' + metadata.voiceMode = true
(Flutter renders these as '语音对话 M/D HH:mm')
titleSet flag prevents overwriting the title on subsequent turns of the same session.
2. session.controller.ts — listSessions() now returns a DTO instead of the raw entity.
systemPrompt is an internal engine instruction and is explicitly excluded from the
response. The client receives { id, status, engineType, metadata, createdAt, updatedAt }.
1. Remove on_enter greeting entirely (no more race condition)
2. voice-agent sends voiceMode: true when engine_type is claude_agent_sdk
3. AgentController.runTaskStream() filters thinking, tool_use, tool_result
events in voice mode — only text, completed, error reach the client
4. Detailed logging: each event logged with [FILTERED-voice] tag when skipped
Claude API mode is completely unaffected (voiceMode defaults to false).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude API supports up to 32MB PDFs; base64 encoding adds ~33% overhead.
50mb body limit covers the maximum single-document upload case.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PDF files were incorrectly wrapped as type:'image' content blocks,
causing Claude API to reject them as "Invalid image data".
- conversation-context.service: check mediaType for application/pdf,
use type:'document' block (Anthropic native PDF support) instead
- claude-agent-sdk-engine: detect both 'image' and 'document' blocks
when deciding to build multimodal SDK prompt
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The direct `import * as express from 'express'` caused a
MODULE_NOT_FOUND error in the Docker production image since express
is only available as a transitive dependency via @nestjs/platform-express.
Use NestExpressApplication.useBodyParser() which is the official NestJS API.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SDK engine now constructs AsyncIterable<SDKUserMessage> with image
content blocks when attachments are present in conversationHistory,
using the SDK's native multimodal prompt format
- CLI engine logs a warning when images are detected, since the `-p`
flag only accepts text (upstream Claude CLI limitation)
- Both SDK and API engines now fully support multimodal image input
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two major features in this commit:
1. Streaming Markdown Rendering Optimization
- Replace deprecated flutter_markdown with gpt_markdown (active, AI-optimized)
- Real-time markdown rendering during streaming (was showing raw syntax)
- Solid block cursor (█) instead of AnimationController blink
- 80ms token throttle buffer reducing rebuilds from per-token to ~12.5/sec
- RepaintBoundary isolation for markdown widget repaints
- StreamTextWidget simplified from StatefulWidget to StatelessWidget
2. Multimodal Image Input (camera + gallery + display)
- Flutter: image_picker for gallery/camera, base64 encoding, attachment
preview strip with delete, thumbnails in sent messages
- Data layer: List<String>? → List<Map<String, dynamic>>? for structured
attachment payloads through datasource/repository/usecase
- ChatAttachment model with base64Data, mediaType, fileName
- ChatMessage entity + ChatMessageModel both support attachments field
- Backend DTO, Entity (JSONB), Controller, ConversationContextService
all extended to receive, store, and reconstruct Anthropic image
content blocks in loadContext()
- Claude API engine skips duplicate user message when history already
ends with multimodal content blocks
- NestJS body parser limit raised to 10MB for base64 image payloads
- Android CAMERA permission added to manifest
- Image.memory uses cacheWidth/cacheHeight for memory efficiency
- Max 5 images per message enforced in UI
Data flow:
ImagePicker → base64Encode → ChatAttachment → POST body →
DB (JSONB) → loadContext → Anthropic image content blocks → Claude API
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6 rounds of systematic audit identified and fixed 14 bugs across
backend controller and Flutter client:
## Backend (agent.controller.ts)
Security & Tenant Isolation:
- Add @TenantId + ForbiddenException check to cancelTask, injectMessage,
approveCommand — all 4 write endpoints now enforce tenant isolation
- Add tenantId check on session reuse in executeTask to prevent
cross-tenant session hijacking
Architecture & Correctness:
- Extract shared runTaskStream() from inline fire-and-forget block,
used by both executeTask and injectMessage to reduce duplication
- Use session.engineType (not getActiveEngine()) in cancelTask,
injectMessage, approveCommand — fixes wrong-engine-cancel when
global engine config is switched after task creation
- Add concurrent task prevention: executeTask checks for existing
RUNNING task on same session and cancels it before starting new one
- Add runningTasks Map to track task promises, awaitTaskCleanup()
helper with 3s timeout for inject to wait for partial text save
- captureSdkSessionId() captures SDK session ID into metadata
without DB save (callers persist), preventing fire-and-forget race
Cancel/Reject Improvements:
- cancelTask: idempotent (returns early if already CANCELLED/COMPLETED),
session stays 'active' (was 'cancelled'), emits cancelled WS event
- approveCommand reject: session stays 'active' (was 'cancelled'),
now emits cancelled WS event so Flutter stream listeners clean up
- approveCommand approved: collect text events and save assistant
response to conversation history on completion (was missing)
Minor:
- task.result! non-null assertion → task.result ?? 'Unknown error'
- Add findRunningBySessionId() to TaskRepository
## Flutter
API Contract Fix:
- approveCommand: route changed from /api/v1/ops/approvals/:id/approve
to /api/v1/agent/tasks/:id/approve with {approved: true} body
- rejectCommand: route changed from /api/v1/ops/approvals/:id/reject
to /api/v1/agent/tasks/:id/approve with {approved: false} body
Resource Management:
- ChatNotifier.dispose() now disconnects WebSocket to prevent
connection leak when navigating away from chat
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend (agent-engine.port.ts):
- Add `cancelled` event type: emitted when a task is cancelled (user-initiated
or injection), so Flutter can close the old stream cleanly
- Add `task_info` event type: emitted after inject to pass the new taskId to
the client, enabling cancel/re-inject on the replacement task
Flutter (features/chat/):
- ChatState: track current `taskId` alongside `sessionId`; clear on completion
or error
- Handle `TaskInfoEvent`: update taskId in state when server issues a new task
- Handle `CancelledEvent`: treat as stream termination (agentStatus → idle)
- MessageType.interrupted: new UI node (warning style) for mid-stream cancels
- _inject(): send text as an inject request while streaming; backend cancels
the current task and starts a new one with the injected message
- Input area: during streaming, hint changes to "追加指令...", Enter key calls
_inject() instead of _send(), and both inject-send + stop buttons are shown
- isAwaitingApproval kept separate from isStreaming so approval flow is not
blocked by inject mode
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously AgentSkillService wrote skills to public.agent_skills (TypeORM
entity with tenantId column filter), while ClaudeAgentSdkEngine read from
it0_t_{tenantId}.skills (per-tenant schema). The two tables were never
connected, so any skill added via the CRUD API was invisible to the agent.
This fix:
- Rewrites AgentSkillService to use DataSource + raw SQL against the
per-tenant schema it0_t_{tenantId}.skills
- Maps API fields: script→content, enabled→is_active
- Removes AgentSkillRepository and AgentSkill entity from module (no longer needed)
- CRUD API response shape is unchanged (fields mapped back to script/enabled)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Load active skills from the tenant's schema `skills` table and append
them to the system prompt before passing to the Claude Agent SDK. This
closes the gap where skills existed in the DB but were never surfaced
to the agent during task execution.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace prompt-prefix workaround with SDK's native resume mechanism.
Each tenant gets isolated HOME directory (/data/claude-tenants/{tenantId})
to prevent cross-tenant session file mixing. SDK session IDs are persisted
in session.metadata for cross-request resume support.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement DB-based conversation message storage (engine-agnostic) that
works across both Claude API and Agent SDK engines. Add ChatGPT/Claude-style
conversation history drawer in Flutter with date-grouped session list,
session switching, and new chat functionality.
Backend: entity, repository, context service, migration 004, session/message
API endpoints. Flutter: ConversationDrawer, sessionId flow from backend
response via SessionInfoEvent, session list/switch/delete support.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Claude API engine now uses streaming API (messages.stream) for real-time
text delta output instead of waiting for full response
- Agent controller accepts optional engineType body parameter to allow
callers (e.g. voice pipeline) to select a specific engine
- Fix voice_test_page.dart compilation error: replace audioplayers (not
installed) with flutter_sound (already in pubspec.yaml)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add finished guard so that once a task reaches completed/error terminal
state, subsequent events don't flip the status back.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SDK sends text both via stream_event deltas (token-level) and assistant
message (complete block). Track hasStreamedText flag per session to skip
duplicate text extraction from assistant messages.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root causes found:
1. SDK engine only emitted 'completed' without 'text' events because
mapSdkMessage skipped text blocks in 'assistant' messages (assumed
stream_event deltas would provide them, but SDK didn't send deltas)
2. Voice pipeline read evt_data.data.content but engine events are flat
(evt_data.content) — so even if text arrived, it was never extracted
Fixes:
- Extract text/thinking blocks from assistant messages in SDK engine
- Fix voice pipeline to read content directly from evt_data, not nested
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Log every SDK message type, event emission, and stream lifecycle
to diagnose why text events are missing in voice-agent flow.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Buffer stream events when no WS clients are subscribed yet, then replay
them when a client subscribes. This eliminates the race condition where
events are lost between task creation and WS subscription.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend:
- Add includePartialMessages: true to SDK query options
- Handle stream_event/content_block_delta for real-time text streaming
- Skip text/thinking blocks from complete assistant messages (already
streamed via deltas) to avoid duplication
- Change default result summary to empty string
Flutter:
- Only show CompletedEvent summary when no assistant text was streamed
(prevents duplicate message bubble)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Socket.IO requires its own handshake protocol (EIO=4) which Kong cannot
proxy as a plain WebSocket upgrade, causing 502 Bad Gateway. Switch to
@nestjs/platform-ws (WsAdapter) with manual session room tracking so
Flutter's IOWebSocketChannel can connect directly.
Also add ws/wss protocols to Kong WebSocket routes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TenantAwareRepository.getRepository() was calling createQueryRunner()
without ever releasing it, causing database connection pool exhaustion.
This caused ops-service (and eventually other services) to hang on
all API requests once the pool filled up.
Replaced getRepository() with withRepository() pattern that wraps
operations in try/finally to always release the QueryRunner.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SDK blocks bypassPermissions when running as root for security.
Add non-root 'appuser' to Dockerfile.service and update volume
mounts to use /home/appuser/.claude paths.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- bypassPermissions blocked by SDK when running as root
- Switch to acceptEdits with canUseTool for programmatic control
- Mount .claude.json config file into container
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
In a Docker container without TTY, permissionMode 'default' blocks
waiting for interactive permission prompts. Switch to bypassPermissions
with canUseTool callback for programmatic risk-based access control.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tsc with module=commonjs converts `await import()` to require(),
which breaks ESM-only packages. Use Function('return import()')
workaround to preserve native dynamic import at runtime.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Mount ~/.claude/ into agent-service container for OAuth token access
- Switch default engine to claude_agent_sdk
- Remove ANTHROPIC_API_KEY from env in subscription mode so SDK uses OAuth
- Keep API key mode for per-tenant billing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Follow iConsulting pattern: set NODE_TLS_REJECT_UNAUTHORIZED=0 when
ANTHROPIC_BASE_URL is configured, enabling connection through the
self-signed proxy at 67.223.119.33.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change AGENT_ENGINE_TYPE from claude_code_cli to claude_api in docker-compose
- Add ANTHROPIC_BASE_URL env var support to claude-api-engine
- Add ANTHROPIC_BASE_URL to agent-service environment in docker-compose
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add restart: unless-stopped to all 12 Docker services
- Add process.on(unhandledRejection/uncaughtException) to all 7 service main.ts
- Fix handleEventTrigger using tenantId UUID as schema name instead of slug lookup
- Wrap Redis event subscription callbacks in try/catch
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Kong handles JWT validation at the gateway level. Service-level
AuthGuard('jwt') fails because services don't register a Passport
JWT strategy (only auth-service does). Removed from 17 controllers
across ops, inventory, monitor, comm, audit, and agent services.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace TenantAwareRepository with standard @InjectRepository
(TenantAwareRepository requires AsyncLocalStorage tenant context
middleware which agent-service does not have)
- Replace @TenantId() decorator with @Headers('x-tenant-id')
for direct HTTP header extraction
- Return defaults gracefully when no tenant is selected
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Agent-service does not have a registered Passport JWT strategy —
JWT validation is handled by Kong API gateway. The AuthGuard was
causing 500 "Unknown authentication strategy" errors on all
new controller endpoints.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement missing REST API endpoints that the web-admin frontend
pages were calling but had no backend support:
- GET/POST/PUT /api/v1/agent-config (engine, prompt, turns, budget, tools)
- GET/POST/PUT/DELETE /api/v1/agent/skills (CRUD for agent skills)
- GET/POST/PUT/DELETE /api/v1/agent/hooks (CRUD for hook scripts)
Each endpoint includes entity, repository, service, and controller
layers following the existing DDD + tenant-aware patterns.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>