hailin/it0 - it0 - AI Wolves Team

Commit Graph

Author	SHA1	Message	Date
hailin	75083f23aa	debug: add TTS send_bytes logging to pipeline Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 06:19:18 -08:00
hailin	c02c2a9a11	feat: add OpenAI TTS/STT provider support in voice pipeline - Add STT_PROVIDER/TTS_PROVIDER config (local or openai) in settings - Pipeline uses OpenAI API for STT/TTS when provider is "openai" - Skip loading local models (Kokoro/faster-whisper) when using OpenAI - VAD (Silero) always loads for speech detection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 09:27:38 -08:00
hailin	740f8f5f88	fix: sentence splitting bug in voice pipeline TTS streaming When the first punctuation mark appeared before _MIN_SENTENCE_LEN chars, the regex search would always find it first and skip it, permanently blocking all subsequent sentence splits. Fix by advancing search_start past short matches instead of breaking out of the loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 05:03:05 -08:00
hailin	65e68a0487	feat: streaming TTS — synthesize per-sentence as agent tokens arrive Replace batch TTS (wait for full response) with streaming approach: - _agent_generate → _agent_stream async generator (yield text chunks) - _process_speech accumulates tokens, splits on sentence boundaries - Each sentence is TTS'd and sent immediately while more tokens arrive - First audio plays within ~1s of agent response vs waiting for full text Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:14:22 -08:00
hailin	aa2a49afd4	fix: extract text from assistant message + fix event data parsing Root causes found: 1. SDK engine only emitted 'completed' without 'text' events because mapSdkMessage skipped text blocks in 'assistant' messages (assumed stream_event deltas would provide them, but SDK didn't send deltas) 2. Voice pipeline read evt_data.data.content but engine events are flat (evt_data.content) — so even if text arrived, it was never extracted Fixes: - Extract text/thinking blocks from assistant messages in SDK engine - Fix voice pipeline to read content directly from evt_data, not nested Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:01:25 -08:00
hailin	0dbe711ed3	feat: add detailed logging to voice pipeline (STT/Agent/TTS timing) Log timestamps, content, and event details at each pipeline stage to help diagnose voice-agent integration issues. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 02:47:21 -08:00
hailin	370e32599f	fix: subscribe to agent WS before creating task to avoid race condition The engine stream could emit text events before the voice pipeline subscribed, causing all text to be lost. Now we connect and subscribe first, then POST the task. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 02:35:57 -08:00
hailin	abf5e29419	feat: route voice pipeline through agent-service instead of direct LLM Voice calls now use the same agent task + WS subscription flow as the chat UI, enabling tool use and command execution during voice sessions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 00:47:31 -08:00
hailin	7afbd54fce	fix: rewrite voice pipeline for direct WebSocket I/O, fix TTS and navigation Root cause: Pipecat's WebsocketServerTransport creates its own WebSocket server on (host,port) and expects FrameProcessor subclasses. Our code was passing a FastAPI WebSocket object as 'host' and using plain STT/TTS/VAD service classes that aren't FrameProcessors. The pipeline crashed immediately when receiving audio, causing "disconnects when speaking". Changes: - base_pipeline.py: Complete rewrite — replaced Pipecat Pipeline with direct async loop: WebSocket → VAD → STT → Claude LLM → TTS → WebSocket. Supports barge-in (interrupt TTS when user speaks), audio chunking, and 24kHz→16kHz TTS resampling. - session_router.py: Pass WebSocket directly to pipeline instead of wrapping in AppTransport. - app_transport.py: Deprecated (no longer needed). - kokoro_service.py: Fix misaki compatibility (MutableToken→MToken rename), use correct Chinese voice 'zf_xiaoxiao', handle torch tensors. - main.py: Apply misaki monkey-patch before importing kokoro. - settings.py: Change default TTS voice from 'zh_female_1' (non-existent) to 'zf_xiaoxiao' (valid Kokoro-82M Chinese female voice). - requirements.txt: Remove pipecat-ai dependency, pin kokoro==0.3.5 + misaki==0.7.17, add Chinese NLP deps (pypinyin, cn2an, jieba, ordered-set). - agent_call_page.dart: Wrap each cleanup step in try/catch to ensure Navigator.pop() always executes after call ends. Add 3s timeout on session delete request. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 23:34:35 -08:00
hailin	39718a9a09	fix: resolve runtime errors for NestJS, Kong, and voice-service - Dockerfile.service: fix entry point path (dist/services/{name}/src/main) due to tsconfig paths widening rootDir during compilation - Kong config: remove unsupported ws/wss protocols (WebSocket works automatically over http/https in Kong 3.7) - voice-service: fix pipecat import path for v0.0.30 API (pipecat.transports.network.websocket_server with lowercase class names) - voice-service: add openai dependency required by pipecat anthropic service Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 19:00:03 -08:00
hailin	00f8801d51	Initial commit: IT0 AI-powered server cluster operations platform Full-stack monorepo with DDD + Clean Architecture: - Backend: 7 NestJS microservices + 5 shared libraries (TypeScript) - Mobile: Flutter app with Riverpod (Dart) - Web Admin: Next.js dashboard with Zustand + React Query - Voice: Python voice service (STT/TTS/VAD) - Infra: Docker Compose, K8s manifests, Turborepo build Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 22:54:37 -08:00

11 Commits