fix: use FIXED mode with 1s silence trigger instead of SMART_TURN
SMART_TURN fragments continuous speech into tiny pieces, each triggering an LLM request that aborts the previous one. FIXED mode waits for a configurable silence duration (1.0s) before emitting FINAL_TRANSCRIPT via the built-in END_OF_UTTERANCE handler. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
e8a3e07116
commit
191ce2d6b3
|
|
@ -30,10 +30,14 @@ by the livekit-plugins-speechmatics package.
|
|||
但 LiveKit 自己也有 Silero VAD 在运行,两个 VAD 冲突。
|
||||
结果:零转写输出,完全静默。
|
||||
|
||||
- SMART_TURN(推荐): 由 Speechmatics 服务器端做智能转弯检测,
|
||||
根据语义和停顿自动判断用户是否说完,主动发 AddSegment (FINAL_TRANSCRIPT)。
|
||||
无需客户端干预,与 LiveKit 框架完全兼容。
|
||||
官方文档: https://docs.speechmatics.com/integrations-and-sdks/livekit
|
||||
- SMART_TURN: 服务器端智能转弯检测,但过于激进,会把连续语音切成碎片
|
||||
(如"你好我是..."被切成"你好。"+"我是..."两个 FINAL),每个碎片触发 LLM 请求
|
||||
导致前一个被 abort,实测不可用。
|
||||
|
||||
- FIXED(当前使用): 服务器检测固定时长静音后发 EndOfUtterance → finalize() → FINAL。
|
||||
通过 end_of_utterance_silence_trigger 参数控制静音阈值(默认 0.5s,当前设 1.0s)。
|
||||
在 VoiceAgentClient 中有内置的 END_OF_UTTERANCE handler 自动调用 finalize()。
|
||||
官方文档: https://docs.speechmatics.com/speech-to-text/realtime/turn-detection
|
||||
|
||||
3. Speaker Diarization(说话人识别)
|
||||
- enable_diarization=True 开启后,每个 segment 带 speaker_id 和 is_active 标记
|
||||
|
|
@ -78,9 +82,10 @@ def create_speechmatics_stt(language: str = "cmn") -> STT:
|
|||
stt = STT(
|
||||
language=sm_lang,
|
||||
include_partials=True,
|
||||
# SMART_TURN: 服务器端智能转弯检测,自动发 FINAL_TRANSCRIPT
|
||||
# 不要用 EXTERNAL(需手动 finalize)或 ADAPTIVE(与 LiveKit VAD 冲突)
|
||||
turn_detection_mode=TurnDetectionMode.SMART_TURN,
|
||||
# FIXED: 服务器检测到 1 秒静音后发 FINAL_TRANSCRIPT
|
||||
# SMART_TURN 会把连续语音切成碎片,EXTERNAL 需手动 finalize,ADAPTIVE 与 LiveKit VAD 冲突
|
||||
turn_detection_mode=TurnDetectionMode.FIXED,
|
||||
end_of_utterance_silence_trigger=1.0,
|
||||
# 说话人识别:区分用户语音与 TTS 回声
|
||||
enable_diarization=True,
|
||||
)
|
||||
|
|
@ -90,7 +95,7 @@ def create_speechmatics_stt(language: str = "cmn") -> STT:
|
|||
stt._stt_options.language = sm_lang # type: ignore[assignment]
|
||||
|
||||
logger.info(
|
||||
"Speechmatics STT created: language=%s (input=%s), mode=SMART_TURN, diarization=True",
|
||||
"Speechmatics STT created: language=%s (input=%s), mode=FIXED(1.0s), diarization=True",
|
||||
sm_lang, language,
|
||||
)
|
||||
return stt
|
||||
|
|
|
|||
Loading…
Reference in New Issue