fix: use FIXED mode with 1s silence trigger instead of SMART_TURN

SMART_TURN fragments continuous speech into tiny pieces, each triggering an LLM request that aborts the previous one. FIXED mode waits for a configurable silence duration (1.0s) before emitting FINAL_TRANSCRIPT via the built-in END_OF_UTTERANCE handler. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 04:53:00 -08:00 · 2026-03-03 04:53:00 -08:00 · 191ce2d6b3
parent e8a3e07116
commit 191ce2d6b3
1 changed files with 13 additions and 8 deletions
--- a/packages/services/voice-agent/src/plugins/speechmatics_stt.py
+++ b/packages/services/voice-agent/src/plugins/speechmatics_stt.py
@ -30,10 +30,14 @@ by the livekit-plugins-speechmatics package.
     但 LiveKit 自己也有 Silero VAD 在运行，两个 VAD 冲突。
     结果：零转写输出，完全静默。

-   - SMART_TURN（推荐）: 由 Speechmatics 服务器端做智能转弯检测，
-     根据语义和停顿自动判断用户是否说完，主动发 AddSegment (FINAL_TRANSCRIPT)。
-     无需客户端干预，与 LiveKit 框架完全兼容。
-     官方文档: https://docs.speechmatics.com/integrations-and-sdks/livekit
+   - SMART_TURN: 服务器端智能转弯检测，但过于激进，会把连续语音切成碎片
+     （如"你好我是..."被切成"你好。"+"我是..."两个 FINAL），每个碎片触发 LLM 请求
+     导致前一个被 abort，实测不可用。
+
+   - FIXED（当前使用）: 服务器检测固定时长静音后发 EndOfUtterance → finalize() → FINAL。
+     通过 end_of_utterance_silence_trigger 参数控制静音阈值（默认 0.5s，当前设 1.0s）。
+     在 VoiceAgentClient 中有内置的 END_OF_UTTERANCE handler 自动调用 finalize()。
+     官方文档: https://docs.speechmatics.com/speech-to-text/realtime/turn-detection

 3. Speaker Diarization（说话人识别）
   - enable_diarization=True 开启后，每个 segment 带 speaker_id 和 is_active 标记
@ -78,9 +82,10 @@ def create_speechmatics_stt(language: str = "cmn") -> STT:
    stt = STT(
        language=sm_lang,
        include_partials=True,
-        # SMART_TURN: 服务器端智能转弯检测，自动发 FINAL_TRANSCRIPT
-        # 不要用 EXTERNAL（需手动 finalize）或 ADAPTIVE（与 LiveKit VAD 冲突）
-        turn_detection_mode=TurnDetectionMode.SMART_TURN,
+        # FIXED: 服务器检测到 1 秒静音后发 FINAL_TRANSCRIPT
+        # SMART_TURN 会把连续语音切成碎片，EXTERNAL 需手动 finalize，ADAPTIVE 与 LiveKit VAD 冲突
+        turn_detection_mode=TurnDetectionMode.FIXED,
+        end_of_utterance_silence_trigger=1.0,
        # 说话人识别：区分用户语音与 TTS 回声
        enable_diarization=True,
    )
@ -90,7 +95,7 @@ def create_speechmatics_stt(language: str = "cmn") -> STT:
    stt._stt_options.language = sm_lang  # type: ignore[assignment]

    logger.info(
-        "Speechmatics STT created: language=%s (input=%s), mode=SMART_TURN, diarization=True",
+        "Speechmatics STT created: language=%s (input=%s), mode=FIXED(1.0s), diarization=True",
        sm_lang, language,
    )
    return stt