Replace traditional on-device speech_to_text with a modern pipeline:
- Record audio via `record` package with hardware noise suppression
- Apply GTCRN neural denoising (sherpa-onnx, ICASSP 2024, 48K params)
- Trim silence, POST to backend /voice/transcribe (faster-whisper)
Changes:
- Add /transcribe endpoint to voice-service for audio file upload
- Add SpeechEnhancer wrapper for sherpa-onnx GTCRN model (523KB)
- Rewrite chat_page.dart voice input: record → denoise → transcribe
- Keep NoiseReducer.trimSilence for silence removal only
- Upgrade record to v6.2.0, add sherpa_onnx, path_provider
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Model downloads (Whisper, Kokoro, Silero VAD) are synchronous blocking
calls that prevent uvicorn from completing startup and responding to
healthchecks. Move all model loading to a daemon thread so the server
starts immediately.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wrap model loading in try/except so server starts even if models fail
- Fix device env var mapping (unified 'device' field instead of 'whisper_device')
- Default Whisper model to 'base' instead of 'large-v3' (3GB) for CPU deployment
- Increase healthcheck start_period to 120s for model download time
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Dockerfile.service: fix entry point path (dist/services/{name}/src/main)
due to tsconfig paths widening rootDir during compilation
- Kong config: remove unsupported ws/wss protocols (WebSocket works
automatically over http/https in Kong 3.7)
- voice-service: fix pipecat import path for v0.0.30 API
(pipecat.transports.network.websocket_server with lowercase class names)
- voice-service: add openai dependency required by pipecat anthropic service
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
faster-whisper 1.0.0 depends on av==11.* which has no prebuilt wheels
and fails to compile. Version 1.2.1 uses av 12+ with prebuilt wheels.
Also removed unnecessary FFmpeg dev libraries from Dockerfile.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PyAV (av==11, dep of faster-whisper) requires pkg-config and
FFmpeg development headers to compile from source.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Server is on HK network, no need for China mirrors. Added
build-essential for compiling native Python packages (kokoro, etc).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
web-admin npm ci was timing out on the server. Added npmmirror.com
for npm and tsinghua mirror for pip to resolve network issues.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>