The bind-mounted SSH key is owned by host uid (1000/node) but the
service runs as appuser (uid 1001). Use su-exec in entrypoint.sh
to copy the key as root, fix ownership, then drop privileges.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mount host key to /tmp/host-ssh-key (read-only), then copy to
appuser's .ssh directory with correct ownership at container start.
Fixes "Permission denied" due to uid mismatch on bind mount.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add openssh-client to Dockerfile.service (alpine)
- Create .ssh directory with correct permissions for appuser
- Mount host SSH key into agent-service container (read-only)
This allows the Agent SDK to SSH into managed servers using the Bash tool.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The createCredential method was missing the tenantId assignment,
causing a NOT NULL constraint violation on the credentials table.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Server side (session_router.py):
- /reconnect now accepts sessions in "active" state (not just "disconnected")
- When client reconnects to an active session, the old WebSocket/pipeline is
automatically replaced when the new WebSocket connects
- Only truly terminal states (e.g. "ended") return 409
Flutter side (agent_call_page.dart):
- Distinguish terminal errors (404 session gone, 409 ended) from transient
errors (network timeout, server unreachable) in reconnect loop
- Terminal errors break immediately instead of wasting retry attempts
- Extract _connectWebSocket() helper for cleaner reconnect flow
- Add DioException handling for proper HTTP status code inspection
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FlutterSoundPlayer.feedUint8FromStream() requires interleaved mode.
With interleaved=false, every feed() call threw:
"Cannot feed with UInt8 with non interleaved mode"
- feedUint8FromStream (Uint8List) → requires interleaved: true
- feedFromStream (Float32List) → requires interleaved: false
Since we feed raw PCM bytes (Uint8List), interleaved must be true.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause: PcmPlayer called openPlayer() without audio session config,
so Android defaulted to earpiece-only mode. When the mic was actively
recording, playback was silently suppressed — the agent's TTS audio was
sent successfully over WebSocket but never reached the speaker.
Changes:
1. PcmPlayer (pcm_player.dart):
- Added audio_session package for proper audio session management
- Configure AudioSession with playAndRecord category so mic + speaker
work simultaneously
- Set voiceCommunication usage to enable Android hardware AEC (echo
cancellation) — prevents feedback loops when speaker is active
- defaultToSpeaker routes output to loudspeaker instead of earpiece
- Restored setSpeakerOn() method stub (used by UI toggle)
2. AgentCallPage (agent_call_page.dart):
- Fixed fire-and-forget bug: _pcmPlayer.feed() returns Future but was
called without await, causing interleaved feedUint8FromStream calls
- Added _feedChain serializer to guarantee sequential audio feeding
3. Dependencies:
- Added audio_session package to pubspec.yaml
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
OpenAI TTS returns 24kHz audio which Android MediaPlayer can't play
via FlutterSound's pcm16WAV codec. Request raw PCM and resample to
16kHz before wrapping in WAV header, matching the local TTS format.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Local /synthesize and /transcribe endpoints now auto-load Kokoro/Whisper
models on first call instead of returning 503 when not pre-loaded at
startup. This allows switching between Local and OpenAI providers in the
Flutter test page without requiring server restart.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause: IOWebSocketChannel.sink.close() can hang indefinitely
(dart-lang/web_socket_channel#185). Previous fix used unawaited close
but didn't cancel the stream subscription, so the old listener could
still push events to _messageController.
Fix: Extract _closeCurrentConnection() that:
1. Cancels StreamSubscription first (stops duplicate events immediately)
2. Fire-and-forget sink.close(goingAway) (frees underlying socket)
This follows the workaround recommended in the official issue tracker.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The await on sink.close() blocks indefinitely when the server doesn't
respond to the close handshake. Use fire-and-forget with unawaited()
so the new connection can proceed immediately.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When sending a second message in the same session, the old WebSocket
connection was not closed, causing both connections to subscribe to the
same session room. This resulted in each text event being received twice,
producing garbled/duplicated output text.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Without this, the SDK engine fails to create tenant HOME directories
because the Docker volume mount point doesn't exist and appuser lacks
write permissions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace prompt-prefix workaround with SDK's native resume mechanism.
Each tenant gets isolated HOME directory (/data/claude-tenants/{tenantId})
to prevent cross-tenant session file mixing. SDK session IDs are persisted
in session.metadata for cross-request resume support.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement DB-based conversation message storage (engine-agnostic) that
works across both Claude API and Agent SDK engines. Add ChatGPT/Claude-style
conversation history drawer in Flutter with date-grouped session list,
session switching, and new chat functionality.
Backend: entity, repository, context service, migration 004, session/message
API endpoints. Flutter: ConversationDrawer, sessionId flow from backend
response via SessionInfoEvent, session list/switch/delete support.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove shared interceptors from the binary Dio instance to prevent
request dedup/retry interceptors from interfering with audio downloads.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add STT_PROVIDER/TTS_PROVIDER config (local or openai) in settings
- Pipeline uses OpenAI API for STT/TTS when provider is "openai"
- Skip loading local models (Kokoro/faster-whisper) when using OpenAI
- VAD (Silero) always loads for speech detection
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hardcoded Colors.grey with Theme.of(context).colorScheme for
result containers and status text so they're readable in both light
and dark themes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add openai package to voice-service requirements
- Add /api/v1/test/tts/synthesize-openai (tts-1/tts-1-hd/gpt-4o-mini-tts)
- Add /api/v1/test/stt/transcribe-openai (gpt-4o-transcribe/whisper-1)
- Add OPENAI_API_KEY and OPENAI_BASE_URL env vars to voice-service
- Flutter test page: SegmentedButton to toggle Local/OpenAI provider
- All endpoints maintain same response format for easy comparison
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove record package dependency, use FlutterSoundRecorder instead
- Use permission_handler for microphone permission (already in pubspec)
- Proper temp file path via path_provider
- Cleanup temp files after upload
- Single package (flutter_sound) handles both recording and playback
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The record package requires a valid file path. Empty string caused
ENOENT (No such file or directory) on Android.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Claude API engine now uses streaming API (messages.stream) for real-time
text delta output instead of waiting for full response
- Agent controller accepts optional engineType body parameter to allow
callers (e.g. voice pipeline) to select a specific engine
- Fix voice_test_page.dart compilation error: replace audioplayers (not
installed) with flutter_sound (already in pubspec.yaml)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- STT: record from mic or upload audio file → faster-whisper transcription
- Round-trip: record → STT → TTS → playback (full pipeline test)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Browser-accessible page to test text-to-speech synthesis without
going through the full voice pipeline.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When the first punctuation mark appeared before _MIN_SENTENCE_LEN chars,
the regex search would always find it first and skip it, permanently
blocking all subsequent sentence splits. Fix by advancing search_start
past short matches instead of breaking out of the loop.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The agent-service was missing the ANTHROPIC_BASE_URL environment variable,
causing the Claude Agent SDK to call api.anthropic.com directly instead of
going through the proxy at 67.223.119.33, resulting in 403 Forbidden errors.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add finished guard so that once a task reaches completed/error terminal
state, subsequent events don't flip the status back.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Default to OAuth subscription billing via ~/.claude/.credentials.json
instead of consuming API key credits.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Thinking content auto-expands while streaming, auto-collapses when done.
User can toggle with "Thinking ∨" button, matching Claude Code VSCode UX.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SDK sends text both via stream_event deltas (token-level) and assistant
message (complete block). Track hasStreamedText flag per session to skip
duplicate text extraction from assistant messages.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace batch TTS (wait for full response) with streaming approach:
- _agent_generate → _agent_stream async generator (yield text chunks)
- _process_speech accumulates tokens, splits on sentence boundaries
- Each sentence is TTS'd and sent immediately while more tokens arrive
- First audio plays within ~1s of agent response vs waiting for full text
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root causes found:
1. SDK engine only emitted 'completed' without 'text' events because
mapSdkMessage skipped text blocks in 'assistant' messages (assumed
stream_event deltas would provide them, but SDK didn't send deltas)
2. Voice pipeline read evt_data.data.content but engine events are flat
(evt_data.content) — so even if text arrived, it was never extracted
Fixes:
- Extract text/thinking blocks from assistant messages in SDK engine
- Fix voice pipeline to read content directly from evt_data, not nested
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Log every SDK message type, event emission, and stream lifecycle
to diagnose why text events are missing in voice-agent flow.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Log timestamps, content, and event details at each pipeline stage
to help diagnose voice-agent integration issues.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Buffer stream events when no WS clients are subscribed yet, then replay
them when a client subscribes. This eliminates the race condition where
events are lost between task creation and WS subscription.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The engine stream could emit text events before the voice pipeline
subscribed, causing all text to be lost. Now we connect and subscribe
first, then POST the task.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Voice calls now use the same agent task + WS subscription flow as the
chat UI, enabling tool use and command execution during voice sessions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause: Pipecat's WebsocketServerTransport creates its own WebSocket
server on (host,port) and expects FrameProcessor subclasses. Our code was
passing a FastAPI WebSocket object as 'host' and using plain STT/TTS/VAD
service classes that aren't FrameProcessors. The pipeline crashed immediately
when receiving audio, causing "disconnects when speaking".
Changes:
- **base_pipeline.py**: Complete rewrite — replaced Pipecat Pipeline with
direct async loop: WebSocket → VAD → STT → Claude LLM → TTS → WebSocket.
Supports barge-in (interrupt TTS when user speaks), audio chunking, and
24kHz→16kHz TTS resampling.
- **session_router.py**: Pass WebSocket directly to pipeline instead of
wrapping in AppTransport.
- **app_transport.py**: Deprecated (no longer needed).
- **kokoro_service.py**: Fix misaki compatibility (MutableToken→MToken
rename), use correct Chinese voice 'zf_xiaoxiao', handle torch tensors.
- **main.py**: Apply misaki monkey-patch before importing kokoro.
- **settings.py**: Change default TTS voice from 'zh_female_1' (non-existent)
to 'zf_xiaoxiao' (valid Kokoro-82M Chinese female voice).
- **requirements.txt**: Remove pipecat-ai dependency, pin kokoro==0.3.5 +
misaki==0.7.17, add Chinese NLP deps (pypinyin, cn2an, jieba, ordered-set).
- **agent_call_page.dart**: Wrap each cleanup step in try/catch to ensure
Navigator.pop() always executes after call ends. Add 3s timeout on session
delete request.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace traditional chat bubble layout with a Claude Code-inspired
timeline/workflow design:
- Vertical gray line connecting sequential event nodes
- Colored dots for each event (green=done, red=error, yellow=warning)
- Animated spinning asterisk (*) on active nodes
- Streaming text with blinking cursor in timeline nodes
- Tool execution shown as code blocks within timeline
- User prompts as distinct nodes with person icon
New file: timeline_event_node.dart (TimelineEventNode, CodeBlock)
Rewritten: chat_page.dart (timeline layout, no more bubbles)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>