fix: increase STT silence_duration_ms to prevent choppy transcription
Default silence_duration_ms=350 is too aggressive for Chinese speech, causing sentences to be fragmented into 1-3 character chunks. Increase to 800ms and raise VAD threshold to 0.6 so the STT waits longer before finalizing a turn, producing complete sentences for LLM processing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
a5c95b460a
commit
186234bae2
|
|
@ -203,6 +203,14 @@ async def entrypoint(ctx: JobContext) -> None:
|
||||||
language=settings.whisper_language,
|
language=settings.whisper_language,
|
||||||
client=_oai_client,
|
client=_oai_client,
|
||||||
use_realtime=True,
|
use_realtime=True,
|
||||||
|
# Increase silence_duration_ms so Chinese speech isn't chopped
|
||||||
|
# into tiny fragments (default 350ms is too aggressive).
|
||||||
|
turn_detection={
|
||||||
|
"type": "server_vad",
|
||||||
|
"threshold": 0.6,
|
||||||
|
"prefix_padding_ms": 600,
|
||||||
|
"silence_duration_ms": 800,
|
||||||
|
},
|
||||||
)
|
)
|
||||||
else:
|
else:
|
||||||
stt = LocalWhisperSTT(
|
stt = LocalWhisperSTT(
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue