- Flutter: language='auto' omits the language field → backend receives none
- Backend: no language field → passes undefined to STT service
- STT service: language=undefined → omits language param from Whisper request
- Whisper auto-detects language per utterance when no hint is provided
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Node 18 native fetch (undici) ignores https.Agent, causing fetch failed
on the self-signed proxy at 67.223.119.33:8443. Switch to https.request
with rejectUnauthorized: false which works reliably.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OPENAI_BASE_URL=https://67.223.119.33:8443/v1 already includes /v1,
so the URL was being built as .../v1/v1/audio/transcriptions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New endpoint: POST /api/v1/agent/sessions/:sessionId/voice-message
- Accepts multipart/form-data audio file (any format Whisper supports)
- Transcribes via OpenAI Whisper API (routed through existing proxy)
- If a task is currently running in the session → hard-interrupts it first
(same cancel+inject pattern as text inject, triggered by voice command)
- Otherwise → starts a fresh task with the transcript
- Returns { sessionId, taskId, transcript } so client can subscribe to WS stream
This enables WhatsApp-style push-to-talk and doubles as an async voice
interrupt into any active agent workflow, bypassing the need for speaker
diarization (whoever presses record owns the message).
New files:
infrastructure/stt/openai-stt.service.ts — OpenAI Whisper client,
manually builds multipart/form-data, supports self-signed proxy cert
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>