it0/voice-agent at e8a3e0711628c380d39602c980286259f7454f47 - it0

History

hailin e8a3e07116 docs: add comprehensive Speechmatics STT integration notes Document all findings from the integration process directly in the source code for future reference: 1. Language code mapping: Speechmatics uses ISO 639-3 "cmn" for Mandarin, but LiveKit LanguageCode auto-normalizes it to "zh". Must override stt._stt_options.language after construction. 2. Turn detection modes (critical): - EXTERNAL: unusable — LiveKit never sends FlushSentinel, only pushes silence frames, so FINAL_TRANSCRIPT never arrives - ADAPTIVE: unusable — client-side Silero VAD conflicts with LiveKit's own VAD, produces zero transcription output - SMART_TURN: correct choice — server-side intelligent turn detection, auto-emits FINAL_TRANSCRIPT, fully compatible 3. Speaker diarization: is_active flag distinguishes primary speaker from TTS echo, solving the "speaker confusion" problem 4. Docker deployment: SPEECHMATICS_API_KEY in .env, watch for COPY layer cache when rebuilding Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>		2026-03-03 04:47:33 -08:00
..
src	docs: add comprehensive Speechmatics STT integration notes	2026-03-03 04:47:33 -08:00
Dockerfile	fix: resolve websockets version conflict and use CPU-only torch	2026-02-28 09:02:31 -08:00
requirements.txt	feat: add STT provider switching (OpenAI ↔ Speechmatics) in settings	2026-03-02 22:13:18 -08:00