hailin/it0 - it0 - AI Wolves Team

Commit Graph

Author	SHA1	Message	Date
hailin	186234bae2	fix: increase STT silence_duration_ms to prevent choppy transcription Default silence_duration_ms=350 is too aggressive for Chinese speech, causing sentences to be fragmented into 1-3 character chunks. Increase to 800ms and raise VAD threshold to 0.6 so the STT waits longer before finalizing a turn, producing complete sentences for LLM processing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 18:37:13 -08:00
hailin	a5c95b460a	fix: patch aiohttp SSL verification for OpenAI Realtime STT WebSocket The OpenAI Realtime STT uses aiohttp WebSocket connections (not httpx), so the existing httpx verify=False fix does not apply. LiveKit's http_context creates aiohttp.TCPConnector without ssl=False, causing SSL certificate verification errors when OPENAI_BASE_URL points to a proxy with a self-signed certificate. Monkey-patch http_context._new_session_ctx to inject ssl=False into the aiohttp connector, fixing the "CERTIFICATE_VERIFY_FAILED" error for Realtime STT WebSocket connections. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 18:29:59 -08:00
hailin	5460be8c04	feat: add TTS voice and style settings to Flutter app Add user-configurable TTS voice and tone style settings that flow from the Flutter app through the backend to the voice-agent at call time. ## Flutter App (it0_app) ### Domain Layer - app_settings.dart: Add `ttsVoice` (default: 'coral') and `ttsStyle` (default: '') fields to AppSettings entity with copyWith support ### Data Layer - settings_datasource.dart: Add SharedPreferences keys `settings_tts_voice` and `settings_tts_style` for local persistence in loadSettings(), saveSettings(), and clearSettings() ### Presentation Layer - settings_providers.dart: Add `setTtsVoice()` and `setTtsStyle()` methods to SettingsNotifier for Riverpod state management - settings_page.dart: Add "语音" settings group between Notifications and Security groups with: - Voice picker: 13 OpenAI voices with gender/style labels (e.g. "女 · 温暖", "男 · 沉稳", "中性") in a BottomSheet - Style picker: 5 presets (专业干练/温柔耐心/轻松活泼/严肃正式/科幻AI) as ChoiceChips + custom text input field + reset button ### Call Flow - agent_call_page.dart: Send `tts_voice` and `tts_style` in the POST body when requesting a LiveKit token at call initiation ## Backend ### voice-service (Python/FastAPI) - livekit_token.py: Accept optional `tts_voice` and `tts_style` via Pydantic TokenRequest body model; embed them in RoomAgentDispatch metadata JSON alongside auth_header (backward compatible) ### voice-agent (Python/LiveKit Agents) - agent.py: Extract `tts_voice` and `tts_style` from ctx.job.metadata; use them when creating openai_plugin.TTS() — user-selected voice overrides config default, user-selected style overrides default instructions. Falls back to config defaults when not provided. ## Data Flow Flutter Settings → SharedPreferences → POST /livekit/token body → voice-service embeds in RoomAgentDispatch metadata → voice-agent reads from ctx.job.metadata → TTS creation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 09:38:15 -08:00
hailin	705647d732	feat: upgrade TTS to gpt-4o-mini-tts with voice instructions - Switch from tts-1 to gpt-4o-mini-tts for lower latency and better quality - Change voice from alloy to coral - Add Chinese speech instructions for natural tone control Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 08:19:05 -08:00
hailin	ba83e433d3	feat: enable OpenAI Realtime STT for streaming speech recognition Switch from batch STT (gpt-4o-transcribe via /audio/transcriptions) to streaming Realtime API (WebSocket). This eliminates the ~2s batch upload+process latency per utterance. Also updated nginx proxy on 67.223.119.33 to support WebSocket upgrade for /v1/realtime endpoint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 07:49:25 -08:00
hailin	e302891f16	fix: disable SSL verify for self-signed OpenAI proxy + handle no-user-msg - Pass httpx.AsyncClient(verify=False) to OpenAI STT/TTS to support self-signed certificate on OPENAI_BASE_URL proxy - Handle generate_reply calls with no user message by falling back to system/developer instructions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 21:39:49 -08:00
hailin	4d47c6a955	fix: remove wait_for_participant — room not connected in rtc_session mode In livekit-agents v1.x @server.rtc_session() pattern, ctx.room is not yet connected when entrypoint is called. session.start() handles room connection internally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 21:15:37 -08:00
hailin	2112445191	fix: voice-agent crash — add room I/O options and filter AgentConfigUpdate - Add room_input_options/room_output_options to session.start() so agent binds audio I/O and stays in the room - Add wait_for_participant() before starting session - Filter AgentConfigUpdate items in agent_llm.py (no 'role' attribute) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 21:08:07 -08:00
hailin	00be878a95	fix: refactor voice-agent to official LiveKit v1.x AgentServer pattern Replace deprecated WorkerOptions(entrypoint_fnc=...) with AgentServer() + @server.rtc_session() decorator. Use server.setup_fnc for prewarm. Remove manual ctx.connect() and ctx.wait_for_participant() calls that prevented the pipeline from properly wiring up VAD→STT→LLM→TTS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 12:31:31 -08:00
hailin	75b14d5200	fix: use RoomOptions instead of deprecated RoomInputOptions RoomInputOptions is deprecated in livekit-agents 1.4.x. Switch to RoomOptions with explicit audio_input/audio_output enabled. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 11:32:36 -08:00
hailin	23b5bce983	fix: extract auth header from job.metadata instead of agent_dispatch LiveKit passes RoomAgentDispatch metadata through as job.metadata (protobuf field), not via a separate agent_dispatch object. Also use room_io.RoomInputOptions for participant targeting (livekit-agents 1.x). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 11:04:02 -08:00
hailin	f1d50e43f1	fix: update AgentSession.start() for livekit-agents 1.x API livekit-agents 1.x removed the 'participant' parameter from AgentSession.start(). Use room_input_options with participant_identity instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 10:31:04 -08:00
hailin	acfdae7773	fix: use livekit-api package for voice-service token endpoint The livekit package is the client SDK and doesn't include the server-side API module. Switch to livekit-api which provides AccessToken, VideoGrants, RoomAgentDispatch, and RoomConfiguration needed for token generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 09:49:11 -08:00
hailin	112c445143	fix: resolve websockets version conflict and use CPU-only torch - Upgrade websockets from ==12.0 to >=13.0 (openai[realtime] requires >=13) - Install torch CPU-only build separately in Dockerfile to avoid ~2GB CUDA download - Remove torch from requirements.txt (installed via --index-url cpu wheel) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 09:02:31 -08:00
hailin	94a14b3104	feat: migrate voice call from WebSocket/PCM to LiveKit WebRTC 实时语音对话架构迁移：WebSocket → LiveKit WebRTC ## 背景原语音通话架构基于 FastAPI WebSocket 传输原始 PCM，管道串行执行（VAD → 批量STT → Agent → 攒句 → 批量TTS），首音频延迟约 6 秒。迁移到 LiveKit Agents 框架后，利用 WebRTC 传输 + 流水线并行，预期延迟降至 1.5-2 秒。 ## 架构 Flutter App ←── WebRTC (Opus/UDP) ──→ LiveKit Server ←──→ Voice Agent livekit_client (自部署, Go) (Python, LiveKit Agents SDK) ├─ VAD (Silero) ├─ STT (faster-whisper / OpenAI) ├─ LLM (自定义插件 → agent-service) └─ TTS (Kokoro / OpenAI) 关键设计：LLM 不直接调用 Claude API，而是通过自定义插件代理到现有 agent-service，保留 Tool Use、会话历史、租户隔离等能力。 ## 新增服务 ### voice-agent (packages/services/voice-agent/) LiveKit Agent Worker，包含： - agent.py: 入口，prewarm() 预加载模型，entrypoint() 编排会话 - plugins/agent_llm.py: 自定义 LLM 插件，代理 agent-service API - POST /api/v1/agent/tasks 创建任务 - WS /ws/agent 订阅流式事件 (stream_event) - 跨轮复用 session_id 保持对话上下文 - plugins/whisper_stt.py: 本地 faster-whisper STT (批量识别) - plugins/kokoro_tts.py: 本地 Kokoro-82M TTS (24kHz PCM) - config.py: pydantic-settings 配置 ### LiveKit Server (deploy/docker/) - livekit.yaml: 信令端口 7880, RTC TCP 7881, UDP 50000-50200 - docker-compose.yml: 新增 livekit-server + voice-agent 容器 ### LiveKit Token 端点 - voice-service/src/api/livekit_token.py: POST /api/v1/voice/livekit/token 生成 Room JWT，嵌入 auth_header 到 AgentDispatch metadata ## Flutter 客户端改造 - agent_call_page.dart: 从 ~814 行简化到 ~380 行 - 替换: WebSocketChannel, AudioRecorder, PcmPlayer, 手动心跳/重连 - 使用: Room.connect(), setMicrophoneEnabled(true), LiveKit 事件监听 - 波形动画改用 participant.audioLevel - pubspec.yaml: 添加 livekit_client: ^2.3.0 - app_config.dart: 增加 livekitUrl 字段 - api_endpoints.dart: 增加 livekitToken 端点 ## 配置说明 (环境变量) - STT_PROVIDER: local (默认, faster-whisper) / openai - TTS_PROVIDER: local (默认, Kokoro) / openai - WHISPER_MODEL: base (默认) / small / medium / large - WHISPER_LANGUAGE: zh (默认) - KOKORO_VOICE: zf_xiaoxiao (默认) - DEVICE: cpu (默认) / cuda ## 不变的部分 - agent-service: 完全不改，voice-agent 通过现有 API 调用 - voice-service 核心: pipeline/STT/TTS/VAD 保留 (Twilio 备用) - Kong 网关: 现有路由不变 - 数据库: 无 schema 变更 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 08:55:33 -08:00
hailin	4987cad881	fix: increase body parser limit to 50mb for large PDF uploads Claude API supports up to 32MB PDFs; base64 encoding adds ~33% overhead. 50mb body limit covers the maximum single-document upload case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 05:35:43 -08:00
hailin	c9367ee22a	fix: PDF attachments sent as document blocks instead of image blocks PDF files were incorrectly wrapped as type:'image' content blocks, causing Claude API to reject them as "Invalid image data". - conversation-context.service: check mediaType for application/pdf, use type:'document' block (Anthropic native PDF support) instead - claude-agent-sdk-engine: detect both 'image' and 'document' blocks when deciding to build multimodal SDK prompt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 05:27:41 -08:00
hailin	9b467924a0	fix: add attachments JSONB column to conversation_messages schema Update migration files to include the attachments column for multimodal image storage. Also add ALTER TABLE migration for existing deployments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 04:18:35 -08:00
hailin	2c657e2b4c	fix: use NestJS native useBodyParser instead of direct express import The direct `import * as express from 'express'` caused a MODULE_NOT_FOUND error in the Docker production image since express is only available as a transitive dependency via @nestjs/platform-express. Use NestExpressApplication.useBodyParser() which is the official NestJS API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 04:01:54 -08:00
hailin	b9c3bfdf91	feat: add multimodal image support to Claude Agent SDK engine - SDK engine now constructs AsyncIterable<SDKUserMessage> with image content blocks when attachments are present in conversationHistory, using the SDK's native multimodal prompt format - CLI engine logs a warning when images are detected, since the `-p` flag only accepts text (upstream Claude CLI limitation) - Both SDK and API engines now fully support multimodal image input Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 03:38:59 -08:00
hailin	e4c2505048	feat: add multimodal image input with streaming markdown optimization Two major features in this commit: 1. Streaming Markdown Rendering Optimization - Replace deprecated flutter_markdown with gpt_markdown (active, AI-optimized) - Real-time markdown rendering during streaming (was showing raw syntax) - Solid block cursor (█) instead of AnimationController blink - 80ms token throttle buffer reducing rebuilds from per-token to ~12.5/sec - RepaintBoundary isolation for markdown widget repaints - StreamTextWidget simplified from StatefulWidget to StatelessWidget 2. Multimodal Image Input (camera + gallery + display) - Flutter: image_picker for gallery/camera, base64 encoding, attachment preview strip with delete, thumbnails in sent messages - Data layer: List<String>? → List<Map<String, dynamic>>? for structured attachment payloads through datasource/repository/usecase - ChatAttachment model with base64Data, mediaType, fileName - ChatMessage entity + ChatMessageModel both support attachments field - Backend DTO, Entity (JSONB), Controller, ConversationContextService all extended to receive, store, and reconstruct Anthropic image content blocks in loadContext() - Claude API engine skips duplicate user message when history already ends with multimodal content blocks - NestJS body parser limit raised to 10MB for base64 image payloads - Android CAMERA permission added to manifest - Image.memory uses cacheWidth/cacheHeight for memory efficiency - Max 5 images per message enforced in UI Data flow: ImagePicker → base64Encode → ChatAttachment → POST body → DB (JSONB) → loadContext → Anthropic image content blocks → Claude API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 03:24:17 -08:00
hailin	50dbb641a3	fix: comprehensive hardening of agent task cancel/inject/approve flows 6 rounds of systematic audit identified and fixed 14 bugs across backend controller and Flutter client: ## Backend (agent.controller.ts) Security & Tenant Isolation: - Add @TenantId + ForbiddenException check to cancelTask, injectMessage, approveCommand — all 4 write endpoints now enforce tenant isolation - Add tenantId check on session reuse in executeTask to prevent cross-tenant session hijacking Architecture & Correctness: - Extract shared runTaskStream() from inline fire-and-forget block, used by both executeTask and injectMessage to reduce duplication - Use session.engineType (not getActiveEngine()) in cancelTask, injectMessage, approveCommand — fixes wrong-engine-cancel when global engine config is switched after task creation - Add concurrent task prevention: executeTask checks for existing RUNNING task on same session and cancels it before starting new one - Add runningTasks Map to track task promises, awaitTaskCleanup() helper with 3s timeout for inject to wait for partial text save - captureSdkSessionId() captures SDK session ID into metadata without DB save (callers persist), preventing fire-and-forget race Cancel/Reject Improvements: - cancelTask: idempotent (returns early if already CANCELLED/COMPLETED), session stays 'active' (was 'cancelled'), emits cancelled WS event - approveCommand reject: session stays 'active' (was 'cancelled'), now emits cancelled WS event so Flutter stream listeners clean up - approveCommand approved: collect text events and save assistant response to conversation history on completion (was missing) Minor: - task.result! non-null assertion → task.result ?? 'Unknown error' - Add findRunningBySessionId() to TaskRepository ## Flutter API Contract Fix: - approveCommand: route changed from /api/v1/ops/approvals/:id/approve to /api/v1/agent/tasks/:id/approve with {approved: true} body - rejectCommand: route changed from /api/v1/ops/approvals/:id/reject to /api/v1/agent/tasks/:id/approve with {approved: false} body Resource Management: - ChatNotifier.dispose() now disconnects WebSocket to prevent connection leak when navigating away from chat Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 22:20:46 -08:00
hailin	d5f663f7af	feat: inject-message support for mid-stream task interruption Backend (agent-engine.port.ts): - Add `cancelled` event type: emitted when a task is cancelled (user-initiated or injection), so Flutter can close the old stream cleanly - Add `task_info` event type: emitted after inject to pass the new taskId to the client, enabling cancel/re-inject on the replacement task Flutter (features/chat/): - ChatState: track current `taskId` alongside `sessionId`; clear on completion or error - Handle `TaskInfoEvent`: update taskId in state when server issues a new task - Handle `CancelledEvent`: treat as stream termination (agentStatus → idle) - MessageType.interrupted: new UI node (warning style) for mid-stream cancels - _inject(): send text as an inject request while streaming; backend cancels the current task and starts a new one with the injected message - Input area: during streaming, hint changes to "追加指令...", Enter key calls _inject() instead of _send(), and both inject-send + stop buttons are shown - isAwaitingApproval kept separate from isStreaming so approval flow is not blocked by inject mode Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 21:33:50 -08:00
hailin	ce4e7840ec	fix: route AgentSkillService to per-tenant schema to match SDK engine Previously AgentSkillService wrote skills to public.agent_skills (TypeORM entity with tenantId column filter), while ClaudeAgentSdkEngine read from it0_t_{tenantId}.skills (per-tenant schema). The two tables were never connected, so any skill added via the CRUD API was invisible to the agent. This fix: - Rewrites AgentSkillService to use DataSource + raw SQL against the per-tenant schema it0_t_{tenantId}.skills - Maps API fields: script→content, enabled→is_active - Removes AgentSkillRepository and AgentSkill entity from module (no longer needed) - CRUD API response shape is unchanged (fields mapped back to script/enabled) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 21:21:36 -08:00
hailin	3278696f4c	feat: inject tenant skills into agent system prompt Load active skills from the tenant's schema `skills` table and append them to the system prompt before passing to the Claude Agent SDK. This closes the gap where skills existed in the DB but were never surfaced to the agent during task execution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 20:42:15 -08:00
hailin	36d36acad4	fix: set tenantId when creating credentials in inventory-service The createCredential method was missing the tenantId assignment, causing a NOT NULL constraint violation on the credentials table. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 10:52:14 -08:00
hailin	51b348e609	feat: complete tenant member management (CRUD + delete tenant) Backend: add 5 missing endpoints to TenantController: - DELETE /tenants/:id (deprovision schema + cleanup) - GET /tenants/:id/members (query tenant schema users) - PATCH /tenants/:id/members/:memberId (change role) - DELETE /tenants/:id/members/:memberId (remove member) - PUT /tenants/:id (alias for frontend compatibility) Frontend: add member actions to tenant detail page: - Role column changed to dropdown selector - Added remove member button with confirmation - Added updateMember and removeMember mutations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 10:00:09 -08:00
hailin	bc7e32061a	fix: improve voice call reconnection robustness Server side (session_router.py): - /reconnect now accepts sessions in "active" state (not just "disconnected") - When client reconnects to an active session, the old WebSocket/pipeline is automatically replaced when the new WebSocket connects - Only truly terminal states (e.g. "ended") return 409 Flutter side (agent_call_page.dart): - Distinguish terminal errors (404 session gone, 409 ended) from transient errors (network timeout, server unreachable) in reconnect loop - Terminal errors break immediately instead of wasting retry attempts - Extract _connectWebSocket() helper for cleaner reconnect flow - Add DioException handling for proper HTTP status code inspection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 07:33:34 -08:00
hailin	75083f23aa	debug: add TTS send_bytes logging to pipeline Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 06:19:18 -08:00
hailin	5be7f9c078	fix: resample OpenAI TTS output from 24kHz to 16kHz WAV OpenAI TTS returns 24kHz audio which Android MediaPlayer can't play via FlutterSound's pcm16WAV codec. Request raw PCM and resample to 16kHz before wrapping in WAV header, matching the local TTS format. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 05:38:39 -08:00
hailin	4456550393	feat: lazy-load local TTS/STT models on first request Local /synthesize and /transcribe endpoints now auto-load Kokoro/Whisper models on first call instead of returning 503 when not pre-loaded at startup. This allows switching between Local and OpenAI providers in the Flutter test page without requiring server restart. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 04:38:49 -08:00
hailin	cc0f06e2be	feat: SDK engine native resume with per-tenant HOME isolation Replace prompt-prefix workaround with SDK's native resume mechanism. Each tenant gets isolated HOME directory (/data/claude-tenants/{tenantId}) to prevent cross-tenant session file mixing. SDK session IDs are persisted in session.metadata for cross-request resume support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 02:27:38 -08:00
hailin	2403ce5636	feat: multi-turn conversation context management with session history UI Implement DB-based conversation message storage (engine-agnostic) that works across both Claude API and Agent SDK engines. Add ChatGPT/Claude-style conversation history drawer in Flutter with date-grouped session list, session switching, and new chat functionality. Backend: entity, repository, context service, migration 004, session/message API endpoints. Flutter: ConversationDrawer, sessionId flow from backend response via SessionInfoEvent, session list/switch/delete support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 19:04:35 -08:00
hailin	c02c2a9a11	feat: add OpenAI TTS/STT provider support in voice pipeline - Add STT_PROVIDER/TTS_PROVIDER config (local or openai) in settings - Pipeline uses OpenAI API for STT/TTS when provider is "openai" - Skip loading local models (Kokoro/faster-whisper) when using OpenAI - VAD (Silero) always loads for speech detection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 09:27:38 -08:00
hailin	f8f0d17820	fix: disable SSL verification for OpenAI proxy with self-signed cert Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 08:59:06 -08:00
hailin	d43baed3a5	feat: add OpenAI TTS/STT API endpoints for comparison testing - Add openai package to voice-service requirements - Add /api/v1/test/tts/synthesize-openai (tts-1/tts-1-hd/gpt-4o-mini-tts) - Add /api/v1/test/stt/transcribe-openai (gpt-4o-transcribe/whisper-1) - Add OPENAI_API_KEY and OPENAI_BASE_URL env vars to voice-service - Flutter test page: SegmentedButton to toggle Local/OpenAI provider - All endpoints maintain same response format for easy comparison Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 07:20:03 -08:00
hailin	5d4fd96d43	feat: streaming claude-api engine, engineType override, fix voice test page - Claude API engine now uses streaming API (messages.stream) for real-time text delta output instead of waiting for full response - Agent controller accepts optional engineType body parameter to allow callers (e.g. voice pipeline) to select a specific engine - Fix voice_test_page.dart compilation error: replace audioplayers (not installed) with flutter_sound (already in pubspec.yaml) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 05:30:11 -08:00
hailin	6e832c7615	feat: add voice I/O test page in Flutter settings - TTS: text input → Kokoro synthesis → audio playback - STT: long-press record → faster-whisper transcription - Round-trip: record → STT → TTS → playback - Added /api/v1/test route to Kong gateway for voice-service - Accessible from Settings → 语音 I/O 测试 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 05:16:10 -08:00
hailin	0bd050c80f	feat: add STT test and round-trip test to voice test page - STT: record from mic or upload audio file → faster-whisper transcription - Round-trip: record → STT → TTS → playback (full pipeline test) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 05:08:00 -08:00
hailin	0aa20cbc73	feat: add temporary TTS test page at /api/v1/test/tts Browser-accessible page to test text-to-speech synthesis without going through the full voice pipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 05:06:02 -08:00
hailin	740f8f5f88	fix: sentence splitting bug in voice pipeline TTS streaming When the first punctuation mark appeared before _MIN_SENTENCE_LEN chars, the regex search would always find it first and skip it, permanently blocking all subsequent sentence splits. Fix by advancing search_start past short matches instead of breaking out of the loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 05:03:05 -08:00
hailin	79fae0629e	chore: upgrade claude-agent-sdk to ^0.2.52 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 04:12:03 -08:00
hailin	2a150dcff5	fix: prevent error event from overriding completed status in controller Add finished guard so that once a task reaches completed/error terminal state, subsequent events don't flip the status back. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:49:21 -08:00
hailin	8e4bd573f4	fix: deduplicate text events from SDK stream_event and assistant message SDK sends text both via stream_event deltas (token-level) and assistant message (complete block). Track hasStreamedText flag per session to skip duplicate text extraction from assistant messages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:31:48 -08:00
hailin	65e68a0487	feat: streaming TTS — synthesize per-sentence as agent tokens arrive Replace batch TTS (wait for full response) with streaming approach: - _agent_generate → _agent_stream async generator (yield text chunks) - _process_speech accumulates tokens, splits on sentence boundaries - Each sentence is TTS'd and sent immediately while more tokens arrive - First audio plays within ~1s of agent response vs waiting for full text Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:14:22 -08:00
hailin	aa2a49afd4	fix: extract text from assistant message + fix event data parsing Root causes found: 1. SDK engine only emitted 'completed' without 'text' events because mapSdkMessage skipped text blocks in 'assistant' messages (assumed stream_event deltas would provide them, but SDK didn't send deltas) 2. Voice pipeline read evt_data.data.content but engine events are flat (evt_data.content) — so even if text arrived, it was never extracted Fixes: - Extract text/thinking blocks from assistant messages in SDK engine - Fix voice pipeline to read content directly from evt_data, not nested Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:01:25 -08:00
hailin	a7b42e6b98	feat: add detailed logging to agent engine and task controller Log every SDK message type, event emission, and stream lifecycle to diagnose why text events are missing in voice-agent flow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 02:56:09 -08:00
hailin	0dbe711ed3	feat: add detailed logging to voice pipeline (STT/Agent/TTS timing) Log timestamps, content, and event details at each pipeline stage to help diagnose voice-agent integration issues. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 02:47:21 -08:00
hailin	1d5c834dfe	feat: add event buffering to agent WS gateway for late subscribers Buffer stream events when no WS clients are subscribed yet, then replay them when a client subscribes. This eliminates the race condition where events are lost between task creation and WS subscription. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 02:41:38 -08:00
hailin	370e32599f	fix: subscribe to agent WS before creating task to avoid race condition The engine stream could emit text events before the voice pipeline subscribed, causing all text to be lost. Now we connect and subscribe first, then POST the task. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 02:35:57 -08:00

1 2 3

108 Commits