hailin/it0 - it0 - AI Wolves Team

Commit Graph

Author	SHA1	Message	Date
hailin	59a3e60b82	feat: add engine type selection (Agent SDK / Claude API) for voice calls Full-stack implementation allowing users to choose between Claude Agent SDK (default, with tool approval, skill injection, session resume) and Claude API (direct, lower latency) in Flutter settings. Agent SDK mode wraps prompts with voice-conversation instructions for concise spoken Chinese output. Data flow: Flutter Settings → SharedPreferences → POST /livekit/token → RoomAgentDispatch metadata → voice-agent → AgentServiceLLM(engine_type) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 02:11:51 -08:00
hailin	5460be8c04	feat: add TTS voice and style settings to Flutter app Add user-configurable TTS voice and tone style settings that flow from the Flutter app through the backend to the voice-agent at call time. ## Flutter App (it0_app) ### Domain Layer - app_settings.dart: Add `ttsVoice` (default: 'coral') and `ttsStyle` (default: '') fields to AppSettings entity with copyWith support ### Data Layer - settings_datasource.dart: Add SharedPreferences keys `settings_tts_voice` and `settings_tts_style` for local persistence in loadSettings(), saveSettings(), and clearSettings() ### Presentation Layer - settings_providers.dart: Add `setTtsVoice()` and `setTtsStyle()` methods to SettingsNotifier for Riverpod state management - settings_page.dart: Add "语音" settings group between Notifications and Security groups with: - Voice picker: 13 OpenAI voices with gender/style labels (e.g. "女 · 温暖", "男 · 沉稳", "中性") in a BottomSheet - Style picker: 5 presets (专业干练/温柔耐心/轻松活泼/严肃正式/科幻AI) as ChoiceChips + custom text input field + reset button ### Call Flow - agent_call_page.dart: Send `tts_voice` and `tts_style` in the POST body when requesting a LiveKit token at call initiation ## Backend ### voice-service (Python/FastAPI) - livekit_token.py: Accept optional `tts_voice` and `tts_style` via Pydantic TokenRequest body model; embed them in RoomAgentDispatch metadata JSON alongside auth_header (backward compatible) ### voice-agent (Python/LiveKit Agents) - agent.py: Extract `tts_voice` and `tts_style` from ctx.job.metadata; use them when creating openai_plugin.TTS() — user-selected voice overrides config default, user-selected style overrides default instructions. Falls back to config defaults when not provided. ## Data Flow Flutter Settings → SharedPreferences → POST /livekit/token body → voice-service embeds in RoomAgentDispatch metadata → voice-agent reads from ctx.job.metadata → TTS creation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 09:38:15 -08:00
hailin	acfdae7773	fix: use livekit-api package for voice-service token endpoint The livekit package is the client SDK and doesn't include the server-side API module. Switch to livekit-api which provides AccessToken, VideoGrants, RoomAgentDispatch, and RoomConfiguration needed for token generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 09:49:11 -08:00
hailin	94a14b3104	feat: migrate voice call from WebSocket/PCM to LiveKit WebRTC 实时语音对话架构迁移：WebSocket → LiveKit WebRTC ## 背景原语音通话架构基于 FastAPI WebSocket 传输原始 PCM，管道串行执行（VAD → 批量STT → Agent → 攒句 → 批量TTS），首音频延迟约 6 秒。迁移到 LiveKit Agents 框架后，利用 WebRTC 传输 + 流水线并行，预期延迟降至 1.5-2 秒。 ## 架构 Flutter App ←── WebRTC (Opus/UDP) ──→ LiveKit Server ←──→ Voice Agent livekit_client (自部署, Go) (Python, LiveKit Agents SDK) ├─ VAD (Silero) ├─ STT (faster-whisper / OpenAI) ├─ LLM (自定义插件 → agent-service) └─ TTS (Kokoro / OpenAI) 关键设计：LLM 不直接调用 Claude API，而是通过自定义插件代理到现有 agent-service，保留 Tool Use、会话历史、租户隔离等能力。 ## 新增服务 ### voice-agent (packages/services/voice-agent/) LiveKit Agent Worker，包含： - agent.py: 入口，prewarm() 预加载模型，entrypoint() 编排会话 - plugins/agent_llm.py: 自定义 LLM 插件，代理 agent-service API - POST /api/v1/agent/tasks 创建任务 - WS /ws/agent 订阅流式事件 (stream_event) - 跨轮复用 session_id 保持对话上下文 - plugins/whisper_stt.py: 本地 faster-whisper STT (批量识别) - plugins/kokoro_tts.py: 本地 Kokoro-82M TTS (24kHz PCM) - config.py: pydantic-settings 配置 ### LiveKit Server (deploy/docker/) - livekit.yaml: 信令端口 7880, RTC TCP 7881, UDP 50000-50200 - docker-compose.yml: 新增 livekit-server + voice-agent 容器 ### LiveKit Token 端点 - voice-service/src/api/livekit_token.py: POST /api/v1/voice/livekit/token 生成 Room JWT，嵌入 auth_header 到 AgentDispatch metadata ## Flutter 客户端改造 - agent_call_page.dart: 从 ~814 行简化到 ~380 行 - 替换: WebSocketChannel, AudioRecorder, PcmPlayer, 手动心跳/重连 - 使用: Room.connect(), setMicrophoneEnabled(true), LiveKit 事件监听 - 波形动画改用 participant.audioLevel - pubspec.yaml: 添加 livekit_client: ^2.3.0 - app_config.dart: 增加 livekitUrl 字段 - api_endpoints.dart: 增加 livekitToken 端点 ## 配置说明 (环境变量) - STT_PROVIDER: local (默认, faster-whisper) / openai - TTS_PROVIDER: local (默认, Kokoro) / openai - WHISPER_MODEL: base (默认) / small / medium / large - WHISPER_LANGUAGE: zh (默认) - KOKORO_VOICE: zf_xiaoxiao (默认) - DEVICE: cpu (默认) / cuda ## 不变的部分 - agent-service: 完全不改，voice-agent 通过现有 API 调用 - voice-service 核心: pipeline/STT/TTS/VAD 保留 (Twilio 备用) - Kong 网关: 现有路由不变 - 数据库: 无 schema 变更 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 08:55:33 -08:00
hailin	bc7e32061a	fix: improve voice call reconnection robustness Server side (session_router.py): - /reconnect now accepts sessions in "active" state (not just "disconnected") - When client reconnects to an active session, the old WebSocket/pipeline is automatically replaced when the new WebSocket connects - Only truly terminal states (e.g. "ended") return 409 Flutter side (agent_call_page.dart): - Distinguish terminal errors (404 session gone, 409 ended) from transient errors (network timeout, server unreachable) in reconnect loop - Terminal errors break immediately instead of wasting retry attempts - Extract _connectWebSocket() helper for cleaner reconnect flow - Add DioException handling for proper HTTP status code inspection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 07:33:34 -08:00
hailin	75083f23aa	debug: add TTS send_bytes logging to pipeline Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 06:19:18 -08:00
hailin	5be7f9c078	fix: resample OpenAI TTS output from 24kHz to 16kHz WAV OpenAI TTS returns 24kHz audio which Android MediaPlayer can't play via FlutterSound's pcm16WAV codec. Request raw PCM and resample to 16kHz before wrapping in WAV header, matching the local TTS format. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 05:38:39 -08:00
hailin	4456550393	feat: lazy-load local TTS/STT models on first request Local /synthesize and /transcribe endpoints now auto-load Kokoro/Whisper models on first call instead of returning 503 when not pre-loaded at startup. This allows switching between Local and OpenAI providers in the Flutter test page without requiring server restart. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 04:38:49 -08:00
hailin	c02c2a9a11	feat: add OpenAI TTS/STT provider support in voice pipeline - Add STT_PROVIDER/TTS_PROVIDER config (local or openai) in settings - Pipeline uses OpenAI API for STT/TTS when provider is "openai" - Skip loading local models (Kokoro/faster-whisper) when using OpenAI - VAD (Silero) always loads for speech detection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 09:27:38 -08:00
hailin	f8f0d17820	fix: disable SSL verification for OpenAI proxy with self-signed cert Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 08:59:06 -08:00
hailin	d43baed3a5	feat: add OpenAI TTS/STT API endpoints for comparison testing - Add openai package to voice-service requirements - Add /api/v1/test/tts/synthesize-openai (tts-1/tts-1-hd/gpt-4o-mini-tts) - Add /api/v1/test/stt/transcribe-openai (gpt-4o-transcribe/whisper-1) - Add OPENAI_API_KEY and OPENAI_BASE_URL env vars to voice-service - Flutter test page: SegmentedButton to toggle Local/OpenAI provider - All endpoints maintain same response format for easy comparison Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 07:20:03 -08:00
hailin	0bd050c80f	feat: add STT test and round-trip test to voice test page - STT: record from mic or upload audio file → faster-whisper transcription - Round-trip: record → STT → TTS → playback (full pipeline test) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 05:08:00 -08:00
hailin	0aa20cbc73	feat: add temporary TTS test page at /api/v1/test/tts Browser-accessible page to test text-to-speech synthesis without going through the full voice pipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 05:06:02 -08:00
hailin	740f8f5f88	fix: sentence splitting bug in voice pipeline TTS streaming When the first punctuation mark appeared before _MIN_SENTENCE_LEN chars, the regex search would always find it first and skip it, permanently blocking all subsequent sentence splits. Fix by advancing search_start past short matches instead of breaking out of the loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 05:03:05 -08:00
hailin	65e68a0487	feat: streaming TTS — synthesize per-sentence as agent tokens arrive Replace batch TTS (wait for full response) with streaming approach: - _agent_generate → _agent_stream async generator (yield text chunks) - _process_speech accumulates tokens, splits on sentence boundaries - Each sentence is TTS'd and sent immediately while more tokens arrive - First audio plays within ~1s of agent response vs waiting for full text Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:14:22 -08:00
hailin	aa2a49afd4	fix: extract text from assistant message + fix event data parsing Root causes found: 1. SDK engine only emitted 'completed' without 'text' events because mapSdkMessage skipped text blocks in 'assistant' messages (assumed stream_event deltas would provide them, but SDK didn't send deltas) 2. Voice pipeline read evt_data.data.content but engine events are flat (evt_data.content) — so even if text arrived, it was never extracted Fixes: - Extract text/thinking blocks from assistant messages in SDK engine - Fix voice pipeline to read content directly from evt_data, not nested Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:01:25 -08:00
hailin	0dbe711ed3	feat: add detailed logging to voice pipeline (STT/Agent/TTS timing) Log timestamps, content, and event details at each pipeline stage to help diagnose voice-agent integration issues. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 02:47:21 -08:00
hailin	370e32599f	fix: subscribe to agent WS before creating task to avoid race condition The engine stream could emit text events before the voice pipeline subscribed, causing all text to be lost. Now we connect and subscribe first, then POST the task. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 02:35:57 -08:00
hailin	abf5e29419	feat: route voice pipeline through agent-service instead of direct LLM Voice calls now use the same agent task + WS subscription flow as the chat UI, enabling tool use and command execution during voice sessions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 00:47:31 -08:00
hailin	7afbd54fce	fix: rewrite voice pipeline for direct WebSocket I/O, fix TTS and navigation Root cause: Pipecat's WebsocketServerTransport creates its own WebSocket server on (host,port) and expects FrameProcessor subclasses. Our code was passing a FastAPI WebSocket object as 'host' and using plain STT/TTS/VAD service classes that aren't FrameProcessors. The pipeline crashed immediately when receiving audio, causing "disconnects when speaking". Changes: - base_pipeline.py: Complete rewrite — replaced Pipecat Pipeline with direct async loop: WebSocket → VAD → STT → Claude LLM → TTS → WebSocket. Supports barge-in (interrupt TTS when user speaks), audio chunking, and 24kHz→16kHz TTS resampling. - session_router.py: Pass WebSocket directly to pipeline instead of wrapping in AppTransport. - app_transport.py: Deprecated (no longer needed). - kokoro_service.py: Fix misaki compatibility (MutableToken→MToken rename), use correct Chinese voice 'zf_xiaoxiao', handle torch tensors. - main.py: Apply misaki monkey-patch before importing kokoro. - settings.py: Change default TTS voice from 'zh_female_1' (non-existent) to 'zf_xiaoxiao' (valid Kokoro-82M Chinese female voice). - requirements.txt: Remove pipecat-ai dependency, pin kokoro==0.3.5 + misaki==0.7.17, add Chinese NLP deps (pypinyin, cn2an, jieba, ordered-set). - agent_call_page.dart: Wrap each cleanup step in try/catch to ensure Navigator.pop() always executes after call ends. Add 3s timeout on session delete request. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 23:34:35 -08:00
hailin	6cd53e713c	fix: bypass JWT for voice WebSocket route (fixes 401 on WS upgrade) 根因：Kong 日志显示 voice WebSocket 连接被 JWT 插件返回 401，因为 WebSocket RFC 6455 不支持自定义 header，Flutter 的 WebSocketChannel.connect 无法携带 Authorization header。修复策略（业界标准做法）： 1. Kong: 将 voice-service 的 JWT 从 service 级别改为 route 级别，仅在 voice-api 和 twilio-webhook 路由启用 JWT， voice-ws 路由免除（session 创建已通过 JWT 验证， session_id 本身作为认证凭据） 2. 后端: session_router 返回的 websocket_url 改为 /ws/voice/{session_id}（匹配 Kong voice-ws 路由路径） 3. FastAPI: 在 app 级别增加 /ws/voice/{session_id} 顶级 WebSocket 路由，委托给 session_router 的 handler Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 21:30:11 -08:00
hailin	9cdc4933dc	fix: add python-multipart dependency for voice-service Required by FastAPI for form/file upload parsing. Missing dependency may cause import errors and container restart loops. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 16:10:50 -08:00
hailin	a6cd3c20d9	feat: add WebSocket robustness to voice call (heartbeat, reconnect, jitter buffer) Addresses reliability gaps in the real-time voice WebSocket connection between Flutter client and Python voice-service backend. Backend (voice-service): - Heartbeat: new _heartbeat_sender coroutine sends JSON ping text frames every 15s alongside the Pipecat pipeline; failed send = dead connection - Session preservation: on WebSocket disconnect, sessions are now marked "disconnected" with a timestamp instead of being deleted, allowing reconnection within a configurable TTL (default 60s) - Reconnect endpoint: POST /sessions/{id}/reconnect verifies the session is alive and in "disconnected" state, returns fresh websocket_url - Reconnect-aware WS handler: detects "disconnected" sessions, cancels stale pipeline tasks, creates a new pipeline, sends "session.resumed" - Background cleanup: asyncio loop every 30s removes sessions that have been disconnected longer than session_ttl - Structured event protocol: text frames = JSON control messages (ping/pong/session.resumed/session.ended/error), binary = PCM audio - New settings: session_ttl (60s), heartbeat_interval (15s), heartbeat_timeout (45s) Flutter (agent_call_page.dart): - Heartbeat monitoring: tracks last server ping timestamp, triggers reconnect if no ping received in 45s (3 missed intervals) - Auto-reconnect: exponential backoff (1s→2s→4s→8s→16s), max 5 attempts; calls /reconnect endpoint to verify session, rebuilds WebSocket, resets audio buffer, restarts heartbeat - Reconnecting UI: yellow warning banner "重新连接中... (N/5)" with spinner overlay during reconnection attempts - WebSocket data routing: _onWsData distinguishes String (JSON control) from binary (audio) frames, handles ping/session.resumed/session.ended - User-initiated disconnect guard: _userEndedCall flag prevents reconnect attempts when user intentionally hangs up - session_id field compatibility: supports session_id/sessionId/id Flutter (pcm_player.dart): - Jitter buffer: queues incoming PCM chunks, starts playback only after accumulating 4800 bytes (150ms at 16kHz 16-bit mono) to smooth out network timing variance - reset() method: clears buffer on reconnect to discard stale audio - Buffer underrun handling: re-enters buffering phase if queue empties Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 07:32:19 -08:00
hailin	4e1b75483d	fix: 修复 .gitignore 误忽略 Flutter data/models/ 源码导致构建失败问题描述: 在其他机器上构建报错: "Error when reading 'lib/features/auth/data/models/auth_response.dart': 系统找不到指定的路径" 导致 AuthUser、AuthResponse 等类型找不到，编译失败。根本原因: 根目录 .gitignore 第75行 "models/" 规则本意是忽略 ML 模型大文件，但该规则匹配了所有目录名为 models/ 的路径，包括 Flutter 项目中 DDD 架构的 data/models/ 源码目录（共 11 个 models/ 目录、10 个 .dart 文件）。这些文件在本地存在但从未被 Git 追踪，其他机器 pull 后缺失这些文件。修复内容: 1. 修改 .gitignore: 将宽泛的 "models/" 替换为精确的规则 - packages/services/voice-service/models/ — voice-service 下载的 ML 模型 - .pt, .pth, *.safetensors — PyTorch/HuggingFace 模型二进制文件 - 不再影响 Flutter 的 data/models/ 源码目录 2. 提交之前被忽略的 10 个 Flutter model 文件: - auth/data/models/auth_response.dart — 登录响应 (accessToken, refreshToken, user) - chat/data/models/chat_message_model.dart — 聊天消息模型 - chat/data/models/session_model.dart — 会话模型 - chat/data/models/stream_event_model.dart — SSE 流事件模型 - servers/data/models/server_model.dart — 服务器状态模型 - approvals/data/models/approval_model.dart — 审批请求模型 - alerts/data/models/alert_event_model.dart — 告警事件模型 - agent_call/data/models/voice_session_model.dart — 语音会话模型 - standing_orders/data/models/standing_order_model.dart — 常设指令模型 - tasks/data/models/task_model.dart — 任务模型 3. 同时提交: - it0_app/test/widget_test.dart — Flutter 默认测试 - packages/services/voice-service/src/models/__init__.py — Python 模块初始化 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 16:29:03 -08:00
hailin	a568558585	feat: replace speech_to_text with GTCRN ML noise reduction + backend STT Replace traditional on-device speech_to_text with a modern pipeline: - Record audio via `record` package with hardware noise suppression - Apply GTCRN neural denoising (sherpa-onnx, ICASSP 2024, 48K params) - Trim silence, POST to backend /voice/transcribe (faster-whisper) Changes: - Add /transcribe endpoint to voice-service for audio file upload - Add SpeechEnhancer wrapper for sherpa-onnx GTCRN model (523KB) - Rewrite chat_page.dart voice input: record → denoise → transcribe - Keep NoiseReducer.trimSilence for silence removal only - Upgrade record to v6.2.0, add sherpa_onnx, path_provider Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 07:59:15 -08:00
hailin	a06b489a1e	fix: load voice models in background thread to unblock startup Model downloads (Whisper, Kokoro, Silero VAD) are synchronous blocking calls that prevent uvicorn from completing startup and responding to healthchecks. Move all model loading to a daemon thread so the server starts immediately. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 00:26:06 -08:00
hailin	3702fa3f52	fix: make voice-service startup graceful and fix device config - Wrap model loading in try/except so server starts even if models fail - Fix device env var mapping (unified 'device' field instead of 'whisper_device') - Default Whisper model to 'base' instead of 'large-v3' (3GB) for CPU deployment - Increase healthcheck start_period to 120s for model download time Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 00:20:12 -08:00
hailin	39718a9a09	fix: resolve runtime errors for NestJS, Kong, and voice-service - Dockerfile.service: fix entry point path (dist/services/{name}/src/main) due to tsconfig paths widening rootDir during compilation - Kong config: remove unsupported ws/wss protocols (WebSocket works automatically over http/https in Kong 3.7) - voice-service: fix pipecat import path for v0.0.30 API (pipecat.transports.network.websocket_server with lowercase class names) - voice-service: add openai dependency required by pipecat anthropic service Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 19:00:03 -08:00
hailin	93c4a21f06	fix: upgrade faster-whisper to 1.2.1 to resolve av build failure faster-whisper 1.0.0 depends on av==11.* which has no prebuilt wheels and fails to compile. Version 1.2.1 uses av 12+ with prebuilt wheels. Also removed unnecessary FFmpeg dev libraries from Dockerfile. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:40:04 -08:00
hailin	6deaf16365	fix: add pkg-config and FFmpeg dev libs for PyAV build PyAV (av==11, dep of faster-whisper) requires pkg-config and FFmpeg development headers to compile from source. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 05:20:37 -08:00
hailin	c0b4f77de5	fix: remove China mirrors, add build-essential for voice-service Server is on HK network, no need for China mirrors. Added build-essential for compiling native Python packages (kokoro, etc). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 05:11:39 -08:00
hailin	9a95cdc4a9	fix: update numpy to 1.26.4 for pipecat-ai compatibility pipecat-ai==0.0.30 requires numpy~=1.26.4, conflicting with 1.26.0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 05:09:01 -08:00
hailin	b382e6e469	fix: add China registry mirrors for npm and pip in Dockerfiles web-admin npm ci was timing out on the server. Added npmmirror.com for npm and tsinghua mirror for pip to resolve network issues. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 04:59:09 -08:00
hailin	ee1ee7b484	fix: remove non-existent scripts/ COPY from voice-service Dockerfile Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 04:39:02 -08:00
hailin	00f8801d51	Initial commit: IT0 AI-powered server cluster operations platform Full-stack monorepo with DDD + Clean Architecture: - Backend: 7 NestJS microservices + 5 shared libraries (TypeScript) - Mobile: Flutter app with Riverpod (Dart) - Web Admin: Next.js dashboard with Zustand + React Query - Voice: Python voice service (STT/TTS/VAD) - Infra: Docker Compose, K8s manifests, Turborepo build Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 22:54:37 -08:00

35 Commits