Commit Graph

17 Commits

Author SHA1 Message Date
hailin bbdb59cc05 debug: add audio frame counters to relay for troubleshooting
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 05:22:52 -07:00
hailin ec17c085b2 fix: setup voice before recv_loop, drain responses, check ws state
Root cause: recv_loop started before open_voice completed, bridge
connection died during UI transition. Now setup completes first.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 05:13:54 -07:00
hailin 216f2fe6a0 feat: add voice relay (Plan B) - ESP32 audio passthrough to Antaf
- voice_bridge_v7.js: audio injection support (type=3 frames)
- relay.py: WebSocket↔TCP bridge with Opus↔PCM + resampling
- test_inject.py: injection verification script
- Injection verified: 1454 frames stable, no crash

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 05:05:24 -07:00
hailin b70c1dd071 feat: append concise reply hint to Antaf queries
Ant Afu tends to give long replies which causes TTS queue delays.
Append "请用2-3句话简短回答" to reduce response length.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 04:30:32 -07:00
hailin 5679622996 fix: resample TTS audio from 44100Hz to 24000Hz for device compatibility
Model outputs 44100Hz but device expects 24000Hz via Opus. Without
resampling, audio plays at wrong speed causing 29s delays between
segments. Verified: synthesis+resample takes 0.38s for 1.6s audio.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 23:11:48 -07:00
hailin 9b2b875c2b fix: run TTS synthesis in thread pool to avoid blocking event loop
Also add size check for int8 model to skip LFS pointer files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 23:05:44 -07:00
hailin 83cdf3396d fix: use full onnx model with 8 threads for fast local TTS
Benchmark: short=0.37s, long=1.06s with 8 CPU threads.
GPU not available in pip sherpa-onnx, CPU is fast enough.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 22:53:59 -07:00
hailin c2727d7e08 fix: clean junk text from Antaf + use int8 TTS model for speed
- Filter "完成资料引用" and other status text from Antaf responses
- Use int8 quantized model for faster TTS inference
- Add configurable num_threads for sherpa-onnx

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 22:52:02 -07:00
hailin e5599d4f43 feat: add sherpa-onnx local TTS provider
Offline VITS TTS using sherpa-onnx, no network dependency.
Uses vits-melo-tts-zh_en model for Chinese/English.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 22:43:51 -07:00
hailin 12b4994ac0 fix: extract content from ASR JSON before sending to Antaf
ASR wraps user speech as JSON {"content":"...", "language":"zh", "emotion":"..."},
extract only the content field instead of sending raw JSON to Antaf bridge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 20:32:45 -07:00
hailin cb9d430cfc fix: skip system-injected user messages, only send real user query to Antaf
The xiaozhi server injects tool_call reminders and system prompts as
role=user messages into dialogue. These were being picked up as the
"last user message" and sent to Antaf bridge instead of the actual query.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 20:27:37 -07:00
hailin f461e341ba fix: filter thinking content and deduplicate SSE chunks in AntafLLM
Ant Afu returns internal reasoning/thinking process mixed with actual
response text, causing TTS to read out internal monologue. Also fixes
duplicate text chunks being sent repeatedly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 20:05:42 -07:00
hailin d399a21f23 feat: add AntafLLM provider for Ant Afu text API integration
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 12:27:49 -07:00
hailin 688a5e17b3 docs: add antaf integration plan for ESP32 device
Two approaches to replace self-hosted Qwen3-32B with Ant Afu AI:
- Plan A: Custom LLM Provider (text API via Frida HTTP Bridge)
- Plan B: Full voice passthrough (audio injection via voice bridge)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 12:26:31 -07:00
hailin ae260da3eb add frontend code 2026-04-05 12:05:11 -07:00
hailin 742389e965 add backend code 2026-04-05 19:01:15 +00:00
hailin ac9061f06d first commit 2026-04-05 18:55:20 +00:00