hailin/it0 - it0 - AI Wolves Team

Commit Graph

Author	SHA1	Message	Date
hailin	3b0119fe09	fix: reduce STT latency, add cooldown dedup, enable diarization - Reduce debounce delay from 700ms to 400ms for faster response - Add 1.5s cooldown after emitting FINAL to prevent duplicate triggers that cause LLM abort/retry cycles - Enable speaker diarization (enable_diarization=True) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 03:20:12 -08:00
hailin	8ac1884ab4	fix: use debounce timer to auto-finalize Speechmatics partial transcripts The LiveKit framework never sends FlushSentinel to the STT stream. Instead it pushes silence frames and waits for FINAL_TRANSCRIPT events. In EXTERNAL turn-detection mode, Speechmatics only emits partials. New approach: each partial transcript restarts a 700ms debounce timer. When partials stop (user stops speaking), the timer fires and promotes the last partial to FINAL_TRANSCRIPT, unblocking the pipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 03:08:17 -08:00
hailin	de3eccafd0	debug: add verbose logging to Speechmatics monkey-patch Trace _patched_process_audio lifecycle and FlushSentinel handling to diagnose why final transcripts are not being promoted. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 02:50:04 -08:00
hailin	1431dc0c83	fix: directly promote partial transcripts to FINAL on FlushSentinel VoiceAgentClient.finalize() schedules an async task chain that often loses the race against session teardown. Instead, intercept partial segments as they arrive, stash them, and synchronously emit them as FINAL_TRANSCRIPT when FlushSentinel fires. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 02:16:46 -08:00
hailin	73fd56f30a	fix: durable monkey-patch for Speechmatics finalize on flush Move the SpeechStream._process_audio patch from container runtime into our own source code so it survives Docker rebuilds. The patch adds client.finalize() on FlushSentinel so EXTERNAL mode produces final transcripts when LiveKit's VAD detects end of speech. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 02:00:42 -08:00
hailin	6707c5048d	fix: use EXTERNAL mode + patch plugin to finalize on flush EXTERNAL mode produces partial transcripts but livekit-plugins-speechmatics does not call finalize() when receiving a flush sentinel from the framework. A runtime monkey-patch on the plugin's SpeechStream._process_audio adds the missing finalize() call so final transcripts are generated. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 01:58:25 -08:00
hailin	8f951ad31c	fix: use turn_detection=stt for Speechmatics per official docs Speechmatics handles end-of-utterance natively via its Voice Agent API (ADAPTIVE mode). Use turn_detection="stt" on AgentSession so LiveKit delegates turn boundaries to the STT engine instead of conflicting with its own VAD-based turn detection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 01:44:10 -08:00
hailin	db4e70e30c	fix: use EXTERNAL turn detection for Speechmatics in LiveKit pipeline ADAPTIVE mode enables a second client-side Silero VAD inside the Speechmatics SDK that conflicts with LiveKit's own VAD pipeline, causing no transcription to be returned. EXTERNAL mode delegates turn detection to LiveKit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 01:31:33 -08:00
hailin	9daf0e3b4f	fix: bypass LanguageCode normalization that maps cmn back to zh LiveKit's LanguageCode class normalizes ISO 639-3 codes to ISO 639-1 (cmn → zh), but Speechmatics API expects "cmn" not "zh". Override the internal _stt_options.language after construction. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 01:04:20 -08:00
hailin	7292ac6ca6	fix: use cmn instead of cmn_en for Speechmatics Voice Agent API cmn_en bilingual code not supported by Voice Agent API, causes timeout. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 00:19:50 -08:00
hailin	17ff9d3ce0	fix: use Speechmatics cmn_en bilingual model for Chinese-English mixed speech Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 23:57:26 -08:00
hailin	1d43943110	fix: correct Speechmatics STT language mapping and parameter name - Map Whisper language codes (zh→cmn, en→en, etc.) to Speechmatics codes - Fix parameter name: enable_partials → include_partials Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 23:56:37 -08:00
hailin	f9c47de04b	feat: add STT provider switching (OpenAI ↔ Speechmatics) in settings - Add VoiceConfig entity/repo/service/controller in agent-service for per-tenant STT provider persistence (default: speechmatics) - Add Speechmatics STT plugin in voice-agent with livekit-plugins-speechmatics - Modify voice-agent entrypoint for 3-way STT selection: metadata > agent-service config > env var fallback - Add "Voice" section in web-admin settings page with STT provider dropdown - Add i18n translations (en/zh) for voice settings - Add SPEECHMATICS_API_KEY env var in docker-compose Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 22:13:18 -08:00

13 Commits