fix: increase STT silence_duration_ms to prevent choppy transcription
Default silence_duration_ms=350 is too aggressive for Chinese speech, causing sentences to be fragmented into 1-3 character chunks. Increase to 800ms and raise VAD threshold to 0.6 so the STT waits longer before finalizing a turn, producing complete sentences for LLM processing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
a5c95b460a
commit
186234bae2
|
|
@ -203,6 +203,14 @@ async def entrypoint(ctx: JobContext) -> None:
|
|||
language=settings.whisper_language,
|
||||
client=_oai_client,
|
||||
use_realtime=True,
|
||||
# Increase silence_duration_ms so Chinese speech isn't chopped
|
||||
# into tiny fragments (default 350ms is too aggressive).
|
||||
turn_detection={
|
||||
"type": "server_vad",
|
||||
"threshold": 0.6,
|
||||
"prefix_padding_ms": 600,
|
||||
"silence_duration_ms": 800,
|
||||
},
|
||||
)
|
||||
else:
|
||||
stt = LocalWhisperSTT(
|
||||
|
|
|
|||
Loading…
Reference in New Issue