1. Remove on_enter greeting entirely (no more race condition)
2. voice-agent sends voiceMode: true when engine_type is claude_agent_sdk
3. AgentController.runTaskStream() filters thinking, tool_use, tool_result
events in voice mode — only text, completed, error reach the client
4. Detailed logging: each event logged with [FILTERED-voice] tag when skipped
Claude API mode is completely unaffected (voiceMode defaults to false).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. Change on_enter greeting from generate_reply() to session.say() with
a static message — avoids spawning an Agent SDK task just for a greeting,
which caused a race condition when the user speaks before it completes.
2. Clear agent session ID when receiving abort/exit errors so the next
task starts a fresh session instead of trying to resume a dead process.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Full-stack implementation allowing users to choose between Claude Agent SDK
(default, with tool approval, skill injection, session resume) and Claude API
(direct, lower latency) in Flutter settings. Agent SDK mode wraps prompts with
voice-conversation instructions for concise spoken Chinese output.
Data flow: Flutter Settings → SharedPreferences → POST /livekit/token →
RoomAgentDispatch metadata → voice-agent → AgentServiceLLM(engine_type)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Flutter (agent_call_page.dart):
- Add ConnectOptions with 15s timeouts for connection/peerConnection/iceRestart
- Add RoomReconnectingEvent/RoomAttemptReconnectEvent/RoomReconnectedEvent
listeners with "网络重连中" UI indicator during reconnection
- Add TimeoutException detection in _friendlyError()
voice-agent (agent.py):
- Wrap entrypoint() in try-except with full traceback logging
- Register room disconnect listener to close httpx clients (instead of
finally block, since session.start() returns while session runs in bg)
- Add asyncio import for ensure_future cleanup
voice-agent LLM proxy (agent_llm.py):
- Add retry with exponential backoff (max 2 retries, 1s/3s delays) for
network errors (ConnectError/ConnectTimeout/OSError) and WS InvalidStatusCode
- Extract _do_stream() method for single-attempt logic
- Add WebSocket connection params: open_timeout=10, ping_interval=20,
ping_timeout=10 for keepalive and faster dead-connection detection
- Use granular httpx.Timeout(connect=10, read=30, write=10, pool=10)
- Increase WS recv timeout from 5s to 30s to reduce unnecessary loops
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Default silence_duration_ms=350 is too aggressive for Chinese speech,
causing sentences to be fragmented into 1-3 character chunks. Increase
to 800ms and raise VAD threshold to 0.6 so the STT waits longer before
finalizing a turn, producing complete sentences for LLM processing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The OpenAI Realtime STT uses aiohttp WebSocket connections (not httpx),
so the existing httpx verify=False fix does not apply. LiveKit's
http_context creates aiohttp.TCPConnector without ssl=False, causing
SSL certificate verification errors when OPENAI_BASE_URL points to a
proxy with a self-signed certificate.
Monkey-patch http_context._new_session_ctx to inject ssl=False into the
aiohttp connector, fixing the "CERTIFICATE_VERIFY_FAILED" error for
Realtime STT WebSocket connections.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add user-configurable TTS voice and tone style settings that flow from
the Flutter app through the backend to the voice-agent at call time.
## Flutter App (it0_app)
### Domain Layer
- app_settings.dart: Add `ttsVoice` (default: 'coral') and `ttsStyle`
(default: '') fields to AppSettings entity with copyWith support
### Data Layer
- settings_datasource.dart: Add SharedPreferences keys
`settings_tts_voice` and `settings_tts_style` for local persistence
in loadSettings(), saveSettings(), and clearSettings()
### Presentation Layer
- settings_providers.dart: Add `setTtsVoice()` and `setTtsStyle()`
methods to SettingsNotifier for Riverpod state management
- settings_page.dart: Add "语音" settings group between Notifications
and Security groups with:
- Voice picker: 13 OpenAI voices with gender/style labels
(e.g. "女 · 温暖", "男 · 沉稳", "中性") in a BottomSheet
- Style picker: 5 presets (专业干练/温柔耐心/轻松活泼/严肃正式/科幻AI)
as ChoiceChips + custom text input field + reset button
### Call Flow
- agent_call_page.dart: Send `tts_voice` and `tts_style` in the POST
body when requesting a LiveKit token at call initiation
## Backend
### voice-service (Python/FastAPI)
- livekit_token.py: Accept optional `tts_voice` and `tts_style` via
Pydantic TokenRequest body model; embed them in RoomAgentDispatch
metadata JSON alongside auth_header (backward compatible)
### voice-agent (Python/LiveKit Agents)
- agent.py: Extract `tts_voice` and `tts_style` from ctx.job.metadata;
use them when creating openai_plugin.TTS() — user-selected voice
overrides config default, user-selected style overrides default
instructions. Falls back to config defaults when not provided.
## Data Flow
Flutter Settings → SharedPreferences → POST /livekit/token body →
voice-service embeds in RoomAgentDispatch metadata →
voice-agent reads from ctx.job.metadata → TTS creation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Switch from tts-1 to gpt-4o-mini-tts for lower latency and better quality
- Change voice from alloy to coral
- Add Chinese speech instructions for natural tone control
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch from batch STT (gpt-4o-transcribe via /audio/transcriptions)
to streaming Realtime API (WebSocket). This eliminates the ~2s batch
upload+process latency per utterance.
Also updated nginx proxy on 67.223.119.33 to support WebSocket upgrade
for /v1/realtime endpoint.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Pass httpx.AsyncClient(verify=False) to OpenAI STT/TTS to support
self-signed certificate on OPENAI_BASE_URL proxy
- Handle generate_reply calls with no user message by falling back to
system/developer instructions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
In livekit-agents v1.x @server.rtc_session() pattern, ctx.room is not
yet connected when entrypoint is called. session.start() handles room
connection internally.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add room_input_options/room_output_options to session.start() so agent
binds audio I/O and stays in the room
- Add wait_for_participant() before starting session
- Filter AgentConfigUpdate items in agent_llm.py (no 'role' attribute)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace deprecated WorkerOptions(entrypoint_fnc=...) with AgentServer() +
@server.rtc_session() decorator. Use server.setup_fnc for prewarm. Remove
manual ctx.connect() and ctx.wait_for_participant() calls that prevented
the pipeline from properly wiring up VAD→STT→LLM→TTS.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
RoomInputOptions is deprecated in livekit-agents 1.4.x. Switch to
RoomOptions with explicit audio_input/audio_output enabled.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
LiveKit passes RoomAgentDispatch metadata through as job.metadata
(protobuf field), not via a separate agent_dispatch object. Also
use room_io.RoomInputOptions for participant targeting (livekit-agents 1.x).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
livekit-agents 1.x removed the 'participant' parameter from
AgentSession.start(). Use room_input_options with participant_identity
instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Upgrade websockets from ==12.0 to >=13.0 (openai[realtime] requires >=13)
- Install torch CPU-only build separately in Dockerfile to avoid ~2GB CUDA download
- Remove torch from requirements.txt (installed via --index-url cpu wheel)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>