feat: enable OpenAI Realtime STT for streaming speech recognition

Switch from batch STT (gpt-4o-transcribe via /audio/transcriptions) to streaming Realtime API (WebSocket). This eliminates the ~2s batch upload+process latency per utterance. Also updated nginx proxy on 67.223.119.33 to support WebSocket upgrade for /v1/realtime endpoint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 07:49:25 -08:00 · 2026-03-01 07:49:25 -08:00 · ba83e433d3
parent e302891f16
commit ba83e433d3
1 changed files with 1 additions and 0 deletions
--- a/packages/services/voice-agent/src/agent.py
+++ b/packages/services/voice-agent/src/agent.py
@ -148,6 +148,7 @@ async def entrypoint(ctx: JobContext) -> None:
            model=settings.openai_stt_model,
            language=settings.whisper_language,
            client=_oai_client,
+            use_realtime=True,
        )
    else:
        stt = LocalWhisperSTT(