ESP32 ignores binary audio unless it receives tts start first.
Also skip silent frames to reduce bandwidth.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ESP32 stops sending audio when it receives tts start because it
thinks the server is speaking. Just acknowledge detect silently.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: Frida crashed because open_voice triggered UI transition
while hook was active. Now: open voice chat first (one-shot script),
then attach bridge when libantaudio.so is already loaded.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: recv_loop started before open_voice completed, bridge
connection died during UI transition. Now setup completes first.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ant Afu tends to give long replies which causes TTS queue delays.
Append "请用2-3句话简短回答" to reduce response length.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Model outputs 44100Hz but device expects 24000Hz via Opus. Without
resampling, audio plays at wrong speed causing 29s delays between
segments. Verified: synthesis+resample takes 0.38s for 1.6s audio.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Benchmark: short=0.37s, long=1.06s with 8 CPU threads.
GPU not available in pip sherpa-onnx, CPU is fast enough.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Filter "完成资料引用" and other status text from Antaf responses
- Use int8 quantized model for faster TTS inference
- Add configurable num_threads for sherpa-onnx
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Offline VITS TTS using sherpa-onnx, no network dependency.
Uses vits-melo-tts-zh_en model for Chinese/English.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ASR wraps user speech as JSON {"content":"...", "language":"zh", "emotion":"..."},
extract only the content field instead of sending raw JSON to Antaf bridge.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The xiaozhi server injects tool_call reminders and system prompts as
role=user messages into dialogue. These were being picked up as the
"last user message" and sent to Antaf bridge instead of the actual query.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ant Afu returns internal reasoning/thinking process mixed with actual
response text, causing TTS to read out internal monologue. Also fixes
duplicate text chunks being sent repeatedly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two approaches to replace self-hosted Qwen3-32B with Ant Afu AI:
- Plan A: Custom LLM Provider (text API via Frida HTTP Bridge)
- Plan B: Full voice passthrough (audio injection via voice bridge)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>