fix: use FIXED mode with 1s silence trigger instead of SMART_TURN

SMART_TURN fragments continuous speech into tiny pieces, each triggering
an LLM request that aborts the previous one. FIXED mode waits for a
configurable silence duration (1.0s) before emitting FINAL_TRANSCRIPT
via the built-in END_OF_UTTERANCE handler.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
hailin 2026-03-03 04:53:00 -08:00
parent e8a3e07116
commit 191ce2d6b3
1 changed files with 13 additions and 8 deletions

View File

@ -30,10 +30,14 @@ by the livekit-plugins-speechmatics package.
LiveKit 自己也有 Silero VAD 在运行两个 VAD 冲突
结果零转写输出完全静默
- SMART_TURN推荐: Speechmatics 服务器端做智能转弯检测
根据语义和停顿自动判断用户是否说完主动发 AddSegment (FINAL_TRANSCRIPT)
无需客户端干预 LiveKit 框架完全兼容
官方文档: https://docs.speechmatics.com/integrations-and-sdks/livekit
- SMART_TURN: 服务器端智能转弯检测但过于激进会把连续语音切成碎片
"你好我是..."被切成"你好。"+"我是..."两个 FINAL每个碎片触发 LLM 请求
导致前一个被 abort实测不可用
- FIXED当前使用: 服务器检测固定时长静音后发 EndOfUtterance finalize() FINAL
通过 end_of_utterance_silence_trigger 参数控制静音阈值默认 0.5s当前设 1.0s
VoiceAgentClient 中有内置的 END_OF_UTTERANCE handler 自动调用 finalize()
官方文档: https://docs.speechmatics.com/speech-to-text/realtime/turn-detection
3. Speaker Diarization说话人识别
- enable_diarization=True 开启后每个 segment speaker_id is_active 标记
@ -78,9 +82,10 @@ def create_speechmatics_stt(language: str = "cmn") -> STT:
stt = STT(
language=sm_lang,
include_partials=True,
# SMART_TURN: 服务器端智能转弯检测,自动发 FINAL_TRANSCRIPT
# 不要用 EXTERNAL需手动 finalize或 ADAPTIVE与 LiveKit VAD 冲突)
turn_detection_mode=TurnDetectionMode.SMART_TURN,
# FIXED: 服务器检测到 1 秒静音后发 FINAL_TRANSCRIPT
# SMART_TURN 会把连续语音切成碎片EXTERNAL 需手动 finalizeADAPTIVE 与 LiveKit VAD 冲突
turn_detection_mode=TurnDetectionMode.FIXED,
end_of_utterance_silence_trigger=1.0,
# 说话人识别:区分用户语音与 TTS 回声
enable_diarization=True,
)
@ -90,7 +95,7 @@ def create_speechmatics_stt(language: str = "cmn") -> STT:
stt._stt_options.language = sm_lang # type: ignore[assignment]
logger.info(
"Speechmatics STT created: language=%s (input=%s), mode=SMART_TURN, diarization=True",
"Speechmatics STT created: language=%s (input=%s), mode=FIXED(1.0s), diarization=True",
sm_lang, language,
)
return stt