fix: use FIXED mode with 1s silence trigger instead of SMART_TURN
SMART_TURN fragments continuous speech into tiny pieces, each triggering an LLM request that aborts the previous one. FIXED mode waits for a configurable silence duration (1.0s) before emitting FINAL_TRANSCRIPT via the built-in END_OF_UTTERANCE handler. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
e8a3e07116
commit
191ce2d6b3
|
|
@ -30,10 +30,14 @@ by the livekit-plugins-speechmatics package.
|
||||||
但 LiveKit 自己也有 Silero VAD 在运行,两个 VAD 冲突。
|
但 LiveKit 自己也有 Silero VAD 在运行,两个 VAD 冲突。
|
||||||
结果:零转写输出,完全静默。
|
结果:零转写输出,完全静默。
|
||||||
|
|
||||||
- SMART_TURN(推荐): 由 Speechmatics 服务器端做智能转弯检测,
|
- SMART_TURN: 服务器端智能转弯检测,但过于激进,会把连续语音切成碎片
|
||||||
根据语义和停顿自动判断用户是否说完,主动发 AddSegment (FINAL_TRANSCRIPT)。
|
(如"你好我是..."被切成"你好。"+"我是..."两个 FINAL),每个碎片触发 LLM 请求
|
||||||
无需客户端干预,与 LiveKit 框架完全兼容。
|
导致前一个被 abort,实测不可用。
|
||||||
官方文档: https://docs.speechmatics.com/integrations-and-sdks/livekit
|
|
||||||
|
- FIXED(当前使用): 服务器检测固定时长静音后发 EndOfUtterance → finalize() → FINAL。
|
||||||
|
通过 end_of_utterance_silence_trigger 参数控制静音阈值(默认 0.5s,当前设 1.0s)。
|
||||||
|
在 VoiceAgentClient 中有内置的 END_OF_UTTERANCE handler 自动调用 finalize()。
|
||||||
|
官方文档: https://docs.speechmatics.com/speech-to-text/realtime/turn-detection
|
||||||
|
|
||||||
3. Speaker Diarization(说话人识别)
|
3. Speaker Diarization(说话人识别)
|
||||||
- enable_diarization=True 开启后,每个 segment 带 speaker_id 和 is_active 标记
|
- enable_diarization=True 开启后,每个 segment 带 speaker_id 和 is_active 标记
|
||||||
|
|
@ -78,9 +82,10 @@ def create_speechmatics_stt(language: str = "cmn") -> STT:
|
||||||
stt = STT(
|
stt = STT(
|
||||||
language=sm_lang,
|
language=sm_lang,
|
||||||
include_partials=True,
|
include_partials=True,
|
||||||
# SMART_TURN: 服务器端智能转弯检测,自动发 FINAL_TRANSCRIPT
|
# FIXED: 服务器检测到 1 秒静音后发 FINAL_TRANSCRIPT
|
||||||
# 不要用 EXTERNAL(需手动 finalize)或 ADAPTIVE(与 LiveKit VAD 冲突)
|
# SMART_TURN 会把连续语音切成碎片,EXTERNAL 需手动 finalize,ADAPTIVE 与 LiveKit VAD 冲突
|
||||||
turn_detection_mode=TurnDetectionMode.SMART_TURN,
|
turn_detection_mode=TurnDetectionMode.FIXED,
|
||||||
|
end_of_utterance_silence_trigger=1.0,
|
||||||
# 说话人识别:区分用户语音与 TTS 回声
|
# 说话人识别:区分用户语音与 TTS 回声
|
||||||
enable_diarization=True,
|
enable_diarization=True,
|
||||||
)
|
)
|
||||||
|
|
@ -90,7 +95,7 @@ def create_speechmatics_stt(language: str = "cmn") -> STT:
|
||||||
stt._stt_options.language = sm_lang # type: ignore[assignment]
|
stt._stt_options.language = sm_lang # type: ignore[assignment]
|
||||||
|
|
||||||
logger.info(
|
logger.info(
|
||||||
"Speechmatics STT created: language=%s (input=%s), mode=SMART_TURN, diarization=True",
|
"Speechmatics STT created: language=%s (input=%s), mode=FIXED(1.0s), diarization=True",
|
||||||
sm_lang, language,
|
sm_lang, language,
|
||||||
)
|
)
|
||||||
return stt
|
return stt
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue