it0/packages
hailin a2af76bcd7 feat(agent-service): add voice message endpoint with Whisper STT and async interrupt
New endpoint: POST /api/v1/agent/sessions/:sessionId/voice-message
- Accepts multipart/form-data audio file (any format Whisper supports)
- Transcribes via OpenAI Whisper API (routed through existing proxy)
- If a task is currently running in the session → hard-interrupts it first
  (same cancel+inject pattern as text inject, triggered by voice command)
- Otherwise → starts a fresh task with the transcript
- Returns { sessionId, taskId, transcript } so client can subscribe to WS stream

This enables WhatsApp-style push-to-talk and doubles as an async voice
interrupt into any active agent workflow, bypassing the need for speaker
diarization (whoever presses record owns the message).

New files:
  infrastructure/stt/openai-stt.service.ts — OpenAI Whisper client,
  manually builds multipart/form-data, supports self-signed proxy cert

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 03:12:03 -08:00
..
gateway feat: implement complete commercial monetization loop (Phases 1-4) 2026-03-03 21:09:17 -08:00
services feat(agent-service): add voice message endpoint with Whisper STT and async interrupt 2026-03-06 03:12:03 -08:00
shared fix: correct billing migration schema refs and testing mock TenantInfo 2026-03-03 21:22:02 -08:00