hailin/it0 - it0 - AI Wolves Team

Commit Graph

Author	SHA1	Message	Date
hailin	da17488389	feat: voice mode event filtering — skip tool/thinking events for Agent SDK 1. Remove on_enter greeting entirely (no more race condition) 2. voice-agent sends voiceMode: true when engine_type is claude_agent_sdk 3. AgentController.runTaskStream() filters thinking, tool_use, tool_result events in voice mode — only text, completed, error reach the client 4. Detailed logging: each event logged with [FILTERED-voice] tag when skipped Claude API mode is completely unaffected (voiceMode defaults to false). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 02:56:41 -08:00
hailin	7c9fabd891	fix: avoid Agent SDK race on greeting + clear session on abort 1. Change on_enter greeting from generate_reply() to session.say() with a static message — avoids spawning an Agent SDK task just for a greeting, which caused a race condition when the user speaks before it completes. 2. Clear agent session ID when receiving abort/exit errors so the next task starts a fresh session instead of trying to resume a dead process. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 02:22:52 -08:00
hailin	a78e2cd923	chore: add detailed engine type logging for verification Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 02:18:29 -08:00
hailin	59a3e60b82	feat: add engine type selection (Agent SDK / Claude API) for voice calls Full-stack implementation allowing users to choose between Claude Agent SDK (default, with tool approval, skill injection, session resume) and Claude API (direct, lower latency) in Flutter settings. Agent SDK mode wraps prompts with voice-conversation instructions for concise spoken Chinese output. Data flow: Flutter Settings → SharedPreferences → POST /livekit/token → RoomAgentDispatch metadata → voice-agent → AgentServiceLLM(engine_type) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 02:11:51 -08:00
hailin	e66c187353	fix: improve voice pipeline robustness for poor network conditions Flutter (agent_call_page.dart): - Add ConnectOptions with 15s timeouts for connection/peerConnection/iceRestart - Add RoomReconnectingEvent/RoomAttemptReconnectEvent/RoomReconnectedEvent listeners with "网络重连中" UI indicator during reconnection - Add TimeoutException detection in _friendlyError() voice-agent (agent.py): - Wrap entrypoint() in try-except with full traceback logging - Register room disconnect listener to close httpx clients (instead of finally block, since session.start() returns while session runs in bg) - Add asyncio import for ensure_future cleanup voice-agent LLM proxy (agent_llm.py): - Add retry with exponential backoff (max 2 retries, 1s/3s delays) for network errors (ConnectError/ConnectTimeout/OSError) and WS InvalidStatusCode - Extract _do_stream() method for single-attempt logic - Add WebSocket connection params: open_timeout=10, ping_interval=20, ping_timeout=10 for keepalive and faster dead-connection detection - Use granular httpx.Timeout(connect=10, read=30, write=10, pool=10) - Increase WS recv timeout from 5s to 30s to reduce unnecessary loops Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 23:34:55 -08:00
hailin	32922c6819	fix: adjust TTS default instructions for faster speech tempo Changed from "语速适中" to "语速稍快，简洁干练" to reduce perceived latency in voice conversations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 22:09:32 -08:00
hailin	186234bae2	fix: increase STT silence_duration_ms to prevent choppy transcription Default silence_duration_ms=350 is too aggressive for Chinese speech, causing sentences to be fragmented into 1-3 character chunks. Increase to 800ms and raise VAD threshold to 0.6 so the STT waits longer before finalizing a turn, producing complete sentences for LLM processing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 18:37:13 -08:00
hailin	a5c95b460a	fix: patch aiohttp SSL verification for OpenAI Realtime STT WebSocket The OpenAI Realtime STT uses aiohttp WebSocket connections (not httpx), so the existing httpx verify=False fix does not apply. LiveKit's http_context creates aiohttp.TCPConnector without ssl=False, causing SSL certificate verification errors when OPENAI_BASE_URL points to a proxy with a self-signed certificate. Monkey-patch http_context._new_session_ctx to inject ssl=False into the aiohttp connector, fixing the "CERTIFICATE_VERIFY_FAILED" error for Realtime STT WebSocket connections. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 18:29:59 -08:00
hailin	5460be8c04	feat: add TTS voice and style settings to Flutter app Add user-configurable TTS voice and tone style settings that flow from the Flutter app through the backend to the voice-agent at call time. ## Flutter App (it0_app) ### Domain Layer - app_settings.dart: Add `ttsVoice` (default: 'coral') and `ttsStyle` (default: '') fields to AppSettings entity with copyWith support ### Data Layer - settings_datasource.dart: Add SharedPreferences keys `settings_tts_voice` and `settings_tts_style` for local persistence in loadSettings(), saveSettings(), and clearSettings() ### Presentation Layer - settings_providers.dart: Add `setTtsVoice()` and `setTtsStyle()` methods to SettingsNotifier for Riverpod state management - settings_page.dart: Add "语音" settings group between Notifications and Security groups with: - Voice picker: 13 OpenAI voices with gender/style labels (e.g. "女 · 温暖", "男 · 沉稳", "中性") in a BottomSheet - Style picker: 5 presets (专业干练/温柔耐心/轻松活泼/严肃正式/科幻AI) as ChoiceChips + custom text input field + reset button ### Call Flow - agent_call_page.dart: Send `tts_voice` and `tts_style` in the POST body when requesting a LiveKit token at call initiation ## Backend ### voice-service (Python/FastAPI) - livekit_token.py: Accept optional `tts_voice` and `tts_style` via Pydantic TokenRequest body model; embed them in RoomAgentDispatch metadata JSON alongside auth_header (backward compatible) ### voice-agent (Python/LiveKit Agents) - agent.py: Extract `tts_voice` and `tts_style` from ctx.job.metadata; use them when creating openai_plugin.TTS() — user-selected voice overrides config default, user-selected style overrides default instructions. Falls back to config defaults when not provided. ## Data Flow Flutter Settings → SharedPreferences → POST /livekit/token body → voice-service embeds in RoomAgentDispatch metadata → voice-agent reads from ctx.job.metadata → TTS creation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 09:38:15 -08:00
hailin	705647d732	feat: upgrade TTS to gpt-4o-mini-tts with voice instructions - Switch from tts-1 to gpt-4o-mini-tts for lower latency and better quality - Change voice from alloy to coral - Add Chinese speech instructions for natural tone control Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 08:19:05 -08:00
hailin	ba83e433d3	feat: enable OpenAI Realtime STT for streaming speech recognition Switch from batch STT (gpt-4o-transcribe via /audio/transcriptions) to streaming Realtime API (WebSocket). This eliminates the ~2s batch upload+process latency per utterance. Also updated nginx proxy on 67.223.119.33 to support WebSocket upgrade for /v1/realtime endpoint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 07:49:25 -08:00
hailin	e302891f16	fix: disable SSL verify for self-signed OpenAI proxy + handle no-user-msg - Pass httpx.AsyncClient(verify=False) to OpenAI STT/TTS to support self-signed certificate on OPENAI_BASE_URL proxy - Handle generate_reply calls with no user message by falling back to system/developer instructions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 21:39:49 -08:00
hailin	4d47c6a955	fix: remove wait_for_participant — room not connected in rtc_session mode In livekit-agents v1.x @server.rtc_session() pattern, ctx.room is not yet connected when entrypoint is called. session.start() handles room connection internally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 21:15:37 -08:00
hailin	2112445191	fix: voice-agent crash — add room I/O options and filter AgentConfigUpdate - Add room_input_options/room_output_options to session.start() so agent binds audio I/O and stays in the room - Add wait_for_participant() before starting session - Filter AgentConfigUpdate items in agent_llm.py (no 'role' attribute) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 21:08:07 -08:00
hailin	00be878a95	fix: refactor voice-agent to official LiveKit v1.x AgentServer pattern Replace deprecated WorkerOptions(entrypoint_fnc=...) with AgentServer() + @server.rtc_session() decorator. Use server.setup_fnc for prewarm. Remove manual ctx.connect() and ctx.wait_for_participant() calls that prevented the pipeline from properly wiring up VAD→STT→LLM→TTS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 12:31:31 -08:00
hailin	75b14d5200	fix: use RoomOptions instead of deprecated RoomInputOptions RoomInputOptions is deprecated in livekit-agents 1.4.x. Switch to RoomOptions with explicit audio_input/audio_output enabled. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 11:32:36 -08:00
hailin	23b5bce983	fix: extract auth header from job.metadata instead of agent_dispatch LiveKit passes RoomAgentDispatch metadata through as job.metadata (protobuf field), not via a separate agent_dispatch object. Also use room_io.RoomInputOptions for participant targeting (livekit-agents 1.x). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 11:04:02 -08:00
hailin	f1d50e43f1	fix: update AgentSession.start() for livekit-agents 1.x API livekit-agents 1.x removed the 'participant' parameter from AgentSession.start(). Use room_input_options with participant_identity instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 10:31:04 -08:00
hailin	112c445143	fix: resolve websockets version conflict and use CPU-only torch - Upgrade websockets from ==12.0 to >=13.0 (openai[realtime] requires >=13) - Install torch CPU-only build separately in Dockerfile to avoid ~2GB CUDA download - Remove torch from requirements.txt (installed via --index-url cpu wheel) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 09:02:31 -08:00
hailin	94a14b3104	feat: migrate voice call from WebSocket/PCM to LiveKit WebRTC 实时语音对话架构迁移：WebSocket → LiveKit WebRTC ## 背景原语音通话架构基于 FastAPI WebSocket 传输原始 PCM，管道串行执行（VAD → 批量STT → Agent → 攒句 → 批量TTS），首音频延迟约 6 秒。迁移到 LiveKit Agents 框架后，利用 WebRTC 传输 + 流水线并行，预期延迟降至 1.5-2 秒。 ## 架构 Flutter App ←── WebRTC (Opus/UDP) ──→ LiveKit Server ←──→ Voice Agent livekit_client (自部署, Go) (Python, LiveKit Agents SDK) ├─ VAD (Silero) ├─ STT (faster-whisper / OpenAI) ├─ LLM (自定义插件 → agent-service) └─ TTS (Kokoro / OpenAI) 关键设计：LLM 不直接调用 Claude API，而是通过自定义插件代理到现有 agent-service，保留 Tool Use、会话历史、租户隔离等能力。 ## 新增服务 ### voice-agent (packages/services/voice-agent/) LiveKit Agent Worker，包含： - agent.py: 入口，prewarm() 预加载模型，entrypoint() 编排会话 - plugins/agent_llm.py: 自定义 LLM 插件，代理 agent-service API - POST /api/v1/agent/tasks 创建任务 - WS /ws/agent 订阅流式事件 (stream_event) - 跨轮复用 session_id 保持对话上下文 - plugins/whisper_stt.py: 本地 faster-whisper STT (批量识别) - plugins/kokoro_tts.py: 本地 Kokoro-82M TTS (24kHz PCM) - config.py: pydantic-settings 配置 ### LiveKit Server (deploy/docker/) - livekit.yaml: 信令端口 7880, RTC TCP 7881, UDP 50000-50200 - docker-compose.yml: 新增 livekit-server + voice-agent 容器 ### LiveKit Token 端点 - voice-service/src/api/livekit_token.py: POST /api/v1/voice/livekit/token 生成 Room JWT，嵌入 auth_header 到 AgentDispatch metadata ## Flutter 客户端改造 - agent_call_page.dart: 从 ~814 行简化到 ~380 行 - 替换: WebSocketChannel, AudioRecorder, PcmPlayer, 手动心跳/重连 - 使用: Room.connect(), setMicrophoneEnabled(true), LiveKit 事件监听 - 波形动画改用 participant.audioLevel - pubspec.yaml: 添加 livekit_client: ^2.3.0 - app_config.dart: 增加 livekitUrl 字段 - api_endpoints.dart: 增加 livekitToken 端点 ## 配置说明 (环境变量) - STT_PROVIDER: local (默认, faster-whisper) / openai - TTS_PROVIDER: local (默认, Kokoro) / openai - WHISPER_MODEL: base (默认) / small / medium / large - WHISPER_LANGUAGE: zh (默认) - KOKORO_VOICE: zf_xiaoxiao (默认) - DEVICE: cpu (默认) / cuda ## 不变的部分 - agent-service: 完全不改，voice-agent 通过现有 API 调用 - voice-service 核心: pipeline/STT/TTS/VAD 保留 (Twilio 备用) - Kong 网关: 现有路由不变 - 数据库: 无 schema 变更 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 08:55:33 -08:00

20 Commits