Whisper detects language from audio content — speaks Chinese gets Chinese,
speaks English gets English. App language setting is irrelevant to STT.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Flutter: language='auto' omits the language field → backend receives none
- Backend: no language field → passes undefined to STT service
- STT service: language=undefined → omits language param from Whisper request
- Whisper auto-detects language per utterance when no hint is provided
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add POST /api/v1/agent/transcribe endpoint (STT only, no agent trigger)
- Add transcribeAudio() to chat datasource and provider
- VoiceMicButton now fills the text input field with transcript;
user reviews and sends manually
- Add OPENAI_API_KEY/OPENAI_BASE_URL to agent-service in docker-compose
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New VoiceMicButton widget (press-and-hold to record, release to send):
- Records audio to a temp .m4a file via the `record` package
- Slide-up gesture cancels recording without sending
- Pulsing red mic icon + "松开发送/松开取消" feedback during recording
New flow for voice messages:
1. Temp "🎤 识别中..." bubble shown immediately
2. Audio uploaded to POST /api/v1/agent/sessions/:id/voice-message
(multipart/form-data; backend runs Whisper STT)
3. Placeholder replaced with real transcript
4. WS stream subscribed via new subscribeExistingTask() to receive
agent's streaming response — same pipeline as text chat
Voice messages act as async interrupts: if the agent is mid-task the
backend hard-cancels it before processing the new voice command,
so whoever presses the mic button always takes priority.
Files changed:
chat_remote_datasource.dart — sendVoiceMessage() multipart upload
chat_repository.dart — subscribeExistingTask() interface method
chat_repository_impl.dart — implement subscribeExistingTask(); fix
sendVoiceMessage() stub
chat_providers.dart — ChatNotifier.sendVoiceMessage()
voice_mic_button.dart — NEW press-and-hold recording widget
chat_page.dart — mic button added to input area
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two major features in this commit:
1. Streaming Markdown Rendering Optimization
- Replace deprecated flutter_markdown with gpt_markdown (active, AI-optimized)
- Real-time markdown rendering during streaming (was showing raw syntax)
- Solid block cursor (█) instead of AnimationController blink
- 80ms token throttle buffer reducing rebuilds from per-token to ~12.5/sec
- RepaintBoundary isolation for markdown widget repaints
- StreamTextWidget simplified from StatefulWidget to StatelessWidget
2. Multimodal Image Input (camera + gallery + display)
- Flutter: image_picker for gallery/camera, base64 encoding, attachment
preview strip with delete, thumbnails in sent messages
- Data layer: List<String>? → List<Map<String, dynamic>>? for structured
attachment payloads through datasource/repository/usecase
- ChatAttachment model with base64Data, mediaType, fileName
- ChatMessage entity + ChatMessageModel both support attachments field
- Backend DTO, Entity (JSONB), Controller, ConversationContextService
all extended to receive, store, and reconstruct Anthropic image
content blocks in loadContext()
- Claude API engine skips duplicate user message when history already
ends with multimodal content blocks
- NestJS body parser limit raised to 10MB for base64 image payloads
- Android CAMERA permission added to manifest
- Image.memory uses cacheWidth/cacheHeight for memory efficiency
- Max 5 images per message enforced in UI
Data flow:
ImagePicker → base64Encode → ChatAttachment → POST body →
DB (JSONB) → loadContext → Anthropic image content blocks → Claude API
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6 rounds of systematic audit identified and fixed 14 bugs across
backend controller and Flutter client:
## Backend (agent.controller.ts)
Security & Tenant Isolation:
- Add @TenantId + ForbiddenException check to cancelTask, injectMessage,
approveCommand — all 4 write endpoints now enforce tenant isolation
- Add tenantId check on session reuse in executeTask to prevent
cross-tenant session hijacking
Architecture & Correctness:
- Extract shared runTaskStream() from inline fire-and-forget block,
used by both executeTask and injectMessage to reduce duplication
- Use session.engineType (not getActiveEngine()) in cancelTask,
injectMessage, approveCommand — fixes wrong-engine-cancel when
global engine config is switched after task creation
- Add concurrent task prevention: executeTask checks for existing
RUNNING task on same session and cancels it before starting new one
- Add runningTasks Map to track task promises, awaitTaskCleanup()
helper with 3s timeout for inject to wait for partial text save
- captureSdkSessionId() captures SDK session ID into metadata
without DB save (callers persist), preventing fire-and-forget race
Cancel/Reject Improvements:
- cancelTask: idempotent (returns early if already CANCELLED/COMPLETED),
session stays 'active' (was 'cancelled'), emits cancelled WS event
- approveCommand reject: session stays 'active' (was 'cancelled'),
now emits cancelled WS event so Flutter stream listeners clean up
- approveCommand approved: collect text events and save assistant
response to conversation history on completion (was missing)
Minor:
- task.result! non-null assertion → task.result ?? 'Unknown error'
- Add findRunningBySessionId() to TaskRepository
## Flutter
API Contract Fix:
- approveCommand: route changed from /api/v1/ops/approvals/:id/approve
to /api/v1/agent/tasks/:id/approve with {approved: true} body
- rejectCommand: route changed from /api/v1/ops/approvals/:id/reject
to /api/v1/agent/tasks/:id/approve with {approved: false} body
Resource Management:
- ChatNotifier.dispose() now disconnects WebSocket to prevent
connection leak when navigating away from chat
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend (agent-engine.port.ts):
- Add `cancelled` event type: emitted when a task is cancelled (user-initiated
or injection), so Flutter can close the old stream cleanly
- Add `task_info` event type: emitted after inject to pass the new taskId to
the client, enabling cancel/re-inject on the replacement task
Flutter (features/chat/):
- ChatState: track current `taskId` alongside `sessionId`; clear on completion
or error
- Handle `TaskInfoEvent`: update taskId in state when server issues a new task
- Handle `CancelledEvent`: treat as stream termination (agentStatus → idle)
- MessageType.interrupted: new UI node (warning style) for mid-stream cancels
- _inject(): send text as an inject request while streaming; backend cancels
the current task and starts a new one with the injected message
- Input area: during streaming, hint changes to "追加指令...", Enter key calls
_inject() instead of _send(), and both inject-send + stop buttons are shown
- isAwaitingApproval kept separate from isStreaming so approval flow is not
blocked by inject mode
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implement DB-based conversation message storage (engine-agnostic) that
works across both Claude API and Agent SDK engines. Add ChatGPT/Claude-style
conversation history drawer in Flutter with date-grouped session list,
session switching, and new chat functionality.
Backend: entity, repository, context service, migration 004, session/message
API endpoints. Flutter: ConversationDrawer, sessionId flow from backend
response via SessionInfoEvent, session list/switch/delete support.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>