Commit Graph

13 Commits

Author SHA1 Message Date
hailin 3b0119fe09 fix: reduce STT latency, add cooldown dedup, enable diarization
- Reduce debounce delay from 700ms to 400ms for faster response
- Add 1.5s cooldown after emitting FINAL to prevent duplicate triggers
  that cause LLM abort/retry cycles
- Enable speaker diarization (enable_diarization=True)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 03:20:12 -08:00
hailin 8ac1884ab4 fix: use debounce timer to auto-finalize Speechmatics partial transcripts
The LiveKit framework never sends FlushSentinel to the STT stream.
Instead it pushes silence frames and waits for FINAL_TRANSCRIPT events.
In EXTERNAL turn-detection mode, Speechmatics only emits partials.

New approach: each partial transcript restarts a 700ms debounce timer.
When partials stop (user stops speaking), the timer fires and promotes
the last partial to FINAL_TRANSCRIPT, unblocking the pipeline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 03:08:17 -08:00
hailin de3eccafd0 debug: add verbose logging to Speechmatics monkey-patch
Trace _patched_process_audio lifecycle and FlushSentinel handling
to diagnose why final transcripts are not being promoted.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 02:50:04 -08:00
hailin 1431dc0c83 fix: directly promote partial transcripts to FINAL on FlushSentinel
VoiceAgentClient.finalize() schedules an async task chain that often
loses the race against session teardown. Instead, intercept partial
segments as they arrive, stash them, and synchronously emit them as
FINAL_TRANSCRIPT when FlushSentinel fires.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 02:16:46 -08:00
hailin 73fd56f30a fix: durable monkey-patch for Speechmatics finalize on flush
Move the SpeechStream._process_audio patch from container runtime
into our own source code so it survives Docker rebuilds. The patch
adds client.finalize() on FlushSentinel so EXTERNAL mode produces
final transcripts when LiveKit's VAD detects end of speech.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 02:00:42 -08:00
hailin 6707c5048d fix: use EXTERNAL mode + patch plugin to finalize on flush
EXTERNAL mode produces partial transcripts but livekit-plugins-speechmatics
does not call finalize() when receiving a flush sentinel from the framework.
A runtime monkey-patch on the plugin's SpeechStream._process_audio adds the
missing finalize() call so final transcripts are generated.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 01:58:25 -08:00
hailin 8f951ad31c fix: use turn_detection=stt for Speechmatics per official docs
Speechmatics handles end-of-utterance natively via its Voice Agent
API (ADAPTIVE mode). Use turn_detection="stt" on AgentSession so
LiveKit delegates turn boundaries to the STT engine instead of
conflicting with its own VAD-based turn detection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 01:44:10 -08:00
hailin db4e70e30c fix: use EXTERNAL turn detection for Speechmatics in LiveKit pipeline
ADAPTIVE mode enables a second client-side Silero VAD inside the
Speechmatics SDK that conflicts with LiveKit's own VAD pipeline,
causing no transcription to be returned. EXTERNAL mode delegates
turn detection to LiveKit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 01:31:33 -08:00
hailin 9daf0e3b4f fix: bypass LanguageCode normalization that maps cmn back to zh
LiveKit's LanguageCode class normalizes ISO 639-3 codes to ISO 639-1
(cmn → zh), but Speechmatics API expects "cmn" not "zh". Override
the internal _stt_options.language after construction.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 01:04:20 -08:00
hailin 7292ac6ca6 fix: use cmn instead of cmn_en for Speechmatics Voice Agent API
cmn_en bilingual code not supported by Voice Agent API, causes timeout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 00:19:50 -08:00
hailin 17ff9d3ce0 fix: use Speechmatics cmn_en bilingual model for Chinese-English mixed speech
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 23:57:26 -08:00
hailin 1d43943110 fix: correct Speechmatics STT language mapping and parameter name
- Map Whisper language codes (zh→cmn, en→en, etc.) to Speechmatics codes
- Fix parameter name: enable_partials → include_partials

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 23:56:37 -08:00
hailin f9c47de04b feat: add STT provider switching (OpenAI ↔ Speechmatics) in settings
- Add VoiceConfig entity/repo/service/controller in agent-service
  for per-tenant STT provider persistence (default: speechmatics)
- Add Speechmatics STT plugin in voice-agent with livekit-plugins-speechmatics
- Modify voice-agent entrypoint for 3-way STT selection:
  metadata > agent-service config > env var fallback
- Add "Voice" section in web-admin settings page with STT provider dropdown
- Add i18n translations (en/zh) for voice settings
- Add SPEECHMATICS_API_KEY env var in docker-compose

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 22:13:18 -08:00