- Add openai package to voice-service requirements
- Add /api/v1/test/tts/synthesize-openai (tts-1/tts-1-hd/gpt-4o-mini-tts)
- Add /api/v1/test/stt/transcribe-openai (gpt-4o-transcribe/whisper-1)
- Add OPENAI_API_KEY and OPENAI_BASE_URL env vars to voice-service
- Flutter test page: SegmentedButton to toggle Local/OpenAI provider
- All endpoints maintain same response format for easy comparison
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause: Pipecat's WebsocketServerTransport creates its own WebSocket
server on (host,port) and expects FrameProcessor subclasses. Our code was
passing a FastAPI WebSocket object as 'host' and using plain STT/TTS/VAD
service classes that aren't FrameProcessors. The pipeline crashed immediately
when receiving audio, causing "disconnects when speaking".
Changes:
- **base_pipeline.py**: Complete rewrite — replaced Pipecat Pipeline with
direct async loop: WebSocket → VAD → STT → Claude LLM → TTS → WebSocket.
Supports barge-in (interrupt TTS when user speaks), audio chunking, and
24kHz→16kHz TTS resampling.
- **session_router.py**: Pass WebSocket directly to pipeline instead of
wrapping in AppTransport.
- **app_transport.py**: Deprecated (no longer needed).
- **kokoro_service.py**: Fix misaki compatibility (MutableToken→MToken
rename), use correct Chinese voice 'zf_xiaoxiao', handle torch tensors.
- **main.py**: Apply misaki monkey-patch before importing kokoro.
- **settings.py**: Change default TTS voice from 'zh_female_1' (non-existent)
to 'zf_xiaoxiao' (valid Kokoro-82M Chinese female voice).
- **requirements.txt**: Remove pipecat-ai dependency, pin kokoro==0.3.5 +
misaki==0.7.17, add Chinese NLP deps (pypinyin, cn2an, jieba, ordered-set).
- **agent_call_page.dart**: Wrap each cleanup step in try/catch to ensure
Navigator.pop() always executes after call ends. Add 3s timeout on session
delete request.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Required by FastAPI for form/file upload parsing. Missing dependency
may cause import errors and container restart loops.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Dockerfile.service: fix entry point path (dist/services/{name}/src/main)
due to tsconfig paths widening rootDir during compilation
- Kong config: remove unsupported ws/wss protocols (WebSocket works
automatically over http/https in Kong 3.7)
- voice-service: fix pipecat import path for v0.0.30 API
(pipecat.transports.network.websocket_server with lowercase class names)
- voice-service: add openai dependency required by pipecat anthropic service
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
faster-whisper 1.0.0 depends on av==11.* which has no prebuilt wheels
and fails to compile. Version 1.2.1 uses av 12+ with prebuilt wheels.
Also removed unnecessary FFmpeg dev libraries from Dockerfile.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>