Flutter (agent_call_page.dart):
- Add ConnectOptions with 15s timeouts for connection/peerConnection/iceRestart
- Add RoomReconnectingEvent/RoomAttemptReconnectEvent/RoomReconnectedEvent
listeners with "网络重连中" UI indicator during reconnection
- Add TimeoutException detection in _friendlyError()
voice-agent (agent.py):
- Wrap entrypoint() in try-except with full traceback logging
- Register room disconnect listener to close httpx clients (instead of
finally block, since session.start() returns while session runs in bg)
- Add asyncio import for ensure_future cleanup
voice-agent LLM proxy (agent_llm.py):
- Add retry with exponential backoff (max 2 retries, 1s/3s delays) for
network errors (ConnectError/ConnectTimeout/OSError) and WS InvalidStatusCode
- Extract _do_stream() method for single-attempt logic
- Add WebSocket connection params: open_timeout=10, ping_interval=20,
ping_timeout=10 for keepalive and faster dead-connection detection
- Use granular httpx.Timeout(connect=10, read=30, write=10, pool=10)
- Increase WS recv timeout from 5s to 30s to reduce unnecessary loops
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
LiveKit's use_external_ip auto-detected 154.84.135.121 (overseas) via
STUN, causing WebRTC ICE candidates to use an unreachable IP for
domestic mobile clients. Explicitly set node_ip to 14.215.128.96.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Flutter app now uses https://it0api.szaiai.com (nginx reverse proxy)
instead of direct IP:port. LiveKit URL uses China IP 14.215.128.96
for lower latency from domestic mobile clients.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
iconsulting-llm-gateway already occupies port 3008 on the host.
voice-service only has a single TCP port (no docker-proxy overhead),
so bridge networking with 13008:3008 mapping is sufficient.
Only livekit-server and voice-agent need host mode (UDP port ranges).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bridge mode created 600+ docker-proxy processes for LiveKit's UDP port-range
mappings (30000-30100, 50000-50200). Switch livekit-server, voice-agent, and
voice-service to network_mode: host for zero-overhead networking.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Default silence_duration_ms=350 is too aggressive for Chinese speech,
causing sentences to be fragmented into 1-3 character chunks. Increase
to 800ms and raise VAD threshold to 0.6 so the STT waits longer before
finalizing a turn, producing complete sentences for LLM processing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The OpenAI Realtime STT uses aiohttp WebSocket connections (not httpx),
so the existing httpx verify=False fix does not apply. LiveKit's
http_context creates aiohttp.TCPConnector without ssl=False, causing
SSL certificate verification errors when OPENAI_BASE_URL points to a
proxy with a self-signed certificate.
Monkey-patch http_context._new_session_ctx to inject ssl=False into the
aiohttp connector, fixing the "CERTIFICATE_VERIFY_FAILED" error for
Realtime STT WebSocket connections.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add user-configurable TTS voice and tone style settings that flow from
the Flutter app through the backend to the voice-agent at call time.
## Flutter App (it0_app)
### Domain Layer
- app_settings.dart: Add `ttsVoice` (default: 'coral') and `ttsStyle`
(default: '') fields to AppSettings entity with copyWith support
### Data Layer
- settings_datasource.dart: Add SharedPreferences keys
`settings_tts_voice` and `settings_tts_style` for local persistence
in loadSettings(), saveSettings(), and clearSettings()
### Presentation Layer
- settings_providers.dart: Add `setTtsVoice()` and `setTtsStyle()`
methods to SettingsNotifier for Riverpod state management
- settings_page.dart: Add "语音" settings group between Notifications
and Security groups with:
- Voice picker: 13 OpenAI voices with gender/style labels
(e.g. "女 · 温暖", "男 · 沉稳", "中性") in a BottomSheet
- Style picker: 5 presets (专业干练/温柔耐心/轻松活泼/严肃正式/科幻AI)
as ChoiceChips + custom text input field + reset button
### Call Flow
- agent_call_page.dart: Send `tts_voice` and `tts_style` in the POST
body when requesting a LiveKit token at call initiation
## Backend
### voice-service (Python/FastAPI)
- livekit_token.py: Accept optional `tts_voice` and `tts_style` via
Pydantic TokenRequest body model; embed them in RoomAgentDispatch
metadata JSON alongside auth_header (backward compatible)
### voice-agent (Python/LiveKit Agents)
- agent.py: Extract `tts_voice` and `tts_style` from ctx.job.metadata;
use them when creating openai_plugin.TTS() — user-selected voice
overrides config default, user-selected style overrides default
instructions. Falls back to config defaults when not provided.
## Data Flow
Flutter Settings → SharedPreferences → POST /livekit/token body →
voice-service embeds in RoomAgentDispatch metadata →
voice-agent reads from ctx.job.metadata → TTS creation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Switch from tts-1 to gpt-4o-mini-tts for lower latency and better quality
- Change voice from alloy to coral
- Add Chinese speech instructions for natural tone control
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch from batch STT (gpt-4o-transcribe via /audio/transcriptions)
to streaming Realtime API (WebSocket). This eliminates the ~2s batch
upload+process latency per utterance.
Also updated nginx proxy on 67.223.119.33 to support WebSocket upgrade
for /v1/realtime endpoint.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Pass httpx.AsyncClient(verify=False) to OpenAI STT/TTS to support
self-signed certificate on OPENAI_BASE_URL proxy
- Handle generate_reply calls with no user message by falling back to
system/developer instructions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
In livekit-agents v1.x @server.rtc_session() pattern, ctx.room is not
yet connected when entrypoint is called. session.start() handles room
connection internally.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add room_input_options/room_output_options to session.start() so agent
binds audio I/O and stays in the room
- Add wait_for_participant() before starting session
- Filter AgentConfigUpdate items in agent_llm.py (no 'role' attribute)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace deprecated WorkerOptions(entrypoint_fnc=...) with AgentServer() +
@server.rtc_session() decorator. Use server.setup_fnc for prewarm. Remove
manual ctx.connect() and ctx.wait_for_participant() calls that prevented
the pipeline from properly wiring up VAD→STT→LLM→TTS.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Subscriber transport was timing out on DTLS handshake for clients
behind complex NAT (VPN/symmetric NAT). Enable LiveKit's built-in
TURN server on UDP port 3478.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
RoomInputOptions is deprecated in livekit-agents 1.4.x. Switch to
RoomOptions with explicit audio_input/audio_output enabled.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use Hardware.instance.setSpeakerphoneOn() to switch between speaker
and earpiece modes. Default to speaker on.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
LiveKit passes RoomAgentDispatch metadata through as job.metadata
(protobuf field), not via a separate agent_dispatch object. Also
use room_io.RoomInputOptions for participant targeting (livekit-agents 1.x).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
livekit-agents 1.x removed the 'participant' parameter from
AgentSession.start(). Use room_input_options with participant_identity
instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
livekit_client 2.6.4 no longer has audioBitrate parameter.
Default AudioPublishOptions auto-selects optimal speech bitrate.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The livekit_ws_url returned in token response needs to be the external
server address, not the internal Docker network name, so Flutter clients
can connect directly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The livekit package is the client SDK and doesn't include the server-side
API module. Switch to livekit-api which provides AccessToken, VideoGrants,
RoomAgentDispatch, and RoomConfiguration needed for token generation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Upgrade websockets from ==12.0 to >=13.0 (openai[realtime] requires >=13)
- Install torch CPU-only build separately in Dockerfile to avoid ~2GB CUDA download
- Remove torch from requirements.txt (installed via --index-url cpu wheel)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FileType.custom with allowedExtensions causes Android system picker
to hide subdirectories on some devices. Changed to FileType.any with
post-selection extension validation instead.
- Unsupported file types are skipped with a SnackBar hint
- Allowed: jpg, jpeg, png, gif, webp, pdf
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude API supports up to 32MB PDFs; base64 encoding adds ~33% overhead.
50mb body limit covers the maximum single-document upload case.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PDF files were incorrectly wrapped as type:'image' content blocks,
causing Claude API to reject them as "Invalid image data".
- conversation-context.service: check mediaType for application/pdf,
use type:'document' block (Anthropic native PDF support) instead
- claude-agent-sdk-engine: detect both 'image' and 'document' blocks
when deciding to build multimodal SDK prompt
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- AppBar background transparent, merges with scaffold for seamless look
- toolbarHeight reduced from 64dp to 44dp (~20dp screen space saved)
- scrolledUnderElevation: 0 prevents Material 3 shadow on scroll
- Icons 24→20px with VisualDensity.compact for tighter action buttons
- Title fontSize 16 w600, less visual weight
- Both dark and light themes updated consistently
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Input area redesign (ChatGPT/Claude App style):
- Replace fixed bottom bar with floating pill overlay using Stack+Positioned
- Semi-transparent background (surface 92% opacity) with rounded corners (28px)
- Drop shadow for depth separation from content
- Remove inner TextField border (InputBorder.none) for cleaner look
- ListView bottom padding increased to 80px to leave room under the pill
- Input pill floats 12px from edges, 8px from bottom
History scroll fix:
- Add jump parameter to _scrollToBottom() for instant positioning
- When loading conversation history (empty→many messages), use jumpTo
instead of animateTo to avoid incomplete scroll on large message lists
- Double-frame jumpTo ensures layout settles before final scroll position
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove redundant "从剪贴板粘贴" option from attachment menu (long-press to paste natively)
- Remove super_clipboard dependency (no longer needed)
- Fix timeline vertical line overlapping icon nodes by using dynamic dotRadius
- Dim input field placeholder color to AppColors.textMuted
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
getFile requires two positional args: format and callback.
Wrapped in Completer for async/await usage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add super_clipboard and file_picker dependencies
- Clipboard paste: reads PNG/JPEG image data from system clipboard
- Multi-image: pickMultiImage with remaining count limit
- File picker: supports images (jpg/png/gif/webp) and PDF files
- Updated attachment preview to show file icon for non-image types
- Bottom sheet now shows 4 options: gallery, camera, clipboard, file
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update migration files to include the attachments column for
multimodal image storage. Also add ALTER TABLE migration for
existing deployments.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The direct `import * as express from 'express'` caused a
MODULE_NOT_FOUND error in the Docker production image since express
is only available as a transitive dependency via @nestjs/platform-express.
Use NestExpressApplication.useBodyParser() which is the official NestJS API.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SDK engine now constructs AsyncIterable<SDKUserMessage> with image
content blocks when attachments are present in conversationHistory,
using the SDK's native multimodal prompt format
- CLI engine logs a warning when images are detected, since the `-p`
flag only accepts text (upstream Claude CLI limitation)
- Both SDK and API engines now fully support multimodal image input
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two major features in this commit:
1. Streaming Markdown Rendering Optimization
- Replace deprecated flutter_markdown with gpt_markdown (active, AI-optimized)
- Real-time markdown rendering during streaming (was showing raw syntax)
- Solid block cursor (█) instead of AnimationController blink
- 80ms token throttle buffer reducing rebuilds from per-token to ~12.5/sec
- RepaintBoundary isolation for markdown widget repaints
- StreamTextWidget simplified from StatefulWidget to StatelessWidget
2. Multimodal Image Input (camera + gallery + display)
- Flutter: image_picker for gallery/camera, base64 encoding, attachment
preview strip with delete, thumbnails in sent messages
- Data layer: List<String>? → List<Map<String, dynamic>>? for structured
attachment payloads through datasource/repository/usecase
- ChatAttachment model with base64Data, mediaType, fileName
- ChatMessage entity + ChatMessageModel both support attachments field
- Backend DTO, Entity (JSONB), Controller, ConversationContextService
all extended to receive, store, and reconstruct Anthropic image
content blocks in loadContext()
- Claude API engine skips duplicate user message when history already
ends with multimodal content blocks
- NestJS body parser limit raised to 10MB for base64 image payloads
- Android CAMERA permission added to manifest
- Image.memory uses cacheWidth/cacheHeight for memory efficiency
- Max 5 images per message enforced in UI
Data flow:
ImagePicker → base64Encode → ChatAttachment → POST body →
DB (JSONB) → loadContext → Anthropic image content blocks → Claude API
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three root causes fixed:
1. TimelineEventNode: Replaced IntrinsicHeight (which forces intrinsic
height calculation on unbounded content) with CustomPaint-based
_TimelineLinePainter that draws vertical lines based on actual
rendered widget size. Also added maxLines/ellipsis to label text
and mainAxisSize.min on inner Column.
2. ApprovalActionCard: Changed countdown + action buttons layout from
Row with Spacer (which requires infinite width) to Wrap with
spacing, preventing horizontal overflow on narrow screens.
3. AnimatedCrossFade in _CollapsibleCodeBlock and _CollapsibleThinking:
Wrapped with ClipRect and added sizeCurve: Curves.easeInOut to
prevent the outgoing child from extending beyond parent bounds
during the cross-fade transition.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6 rounds of systematic audit identified and fixed 14 bugs across
backend controller and Flutter client:
## Backend (agent.controller.ts)
Security & Tenant Isolation:
- Add @TenantId + ForbiddenException check to cancelTask, injectMessage,
approveCommand — all 4 write endpoints now enforce tenant isolation
- Add tenantId check on session reuse in executeTask to prevent
cross-tenant session hijacking
Architecture & Correctness:
- Extract shared runTaskStream() from inline fire-and-forget block,
used by both executeTask and injectMessage to reduce duplication
- Use session.engineType (not getActiveEngine()) in cancelTask,
injectMessage, approveCommand — fixes wrong-engine-cancel when
global engine config is switched after task creation
- Add concurrent task prevention: executeTask checks for existing
RUNNING task on same session and cancels it before starting new one
- Add runningTasks Map to track task promises, awaitTaskCleanup()
helper with 3s timeout for inject to wait for partial text save
- captureSdkSessionId() captures SDK session ID into metadata
without DB save (callers persist), preventing fire-and-forget race
Cancel/Reject Improvements:
- cancelTask: idempotent (returns early if already CANCELLED/COMPLETED),
session stays 'active' (was 'cancelled'), emits cancelled WS event
- approveCommand reject: session stays 'active' (was 'cancelled'),
now emits cancelled WS event so Flutter stream listeners clean up
- approveCommand approved: collect text events and save assistant
response to conversation history on completion (was missing)
Minor:
- task.result! non-null assertion → task.result ?? 'Unknown error'
- Add findRunningBySessionId() to TaskRepository
## Flutter
API Contract Fix:
- approveCommand: route changed from /api/v1/ops/approvals/:id/approve
to /api/v1/agent/tasks/:id/approve with {approved: true} body
- rejectCommand: route changed from /api/v1/ops/approvals/:id/reject
to /api/v1/agent/tasks/:id/approve with {approved: false} body
Resource Management:
- ChatNotifier.dispose() now disconnects WebSocket to prevent
connection leak when navigating away from chat
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend (agent-engine.port.ts):
- Add `cancelled` event type: emitted when a task is cancelled (user-initiated
or injection), so Flutter can close the old stream cleanly
- Add `task_info` event type: emitted after inject to pass the new taskId to
the client, enabling cancel/re-inject on the replacement task
Flutter (features/chat/):
- ChatState: track current `taskId` alongside `sessionId`; clear on completion
or error
- Handle `TaskInfoEvent`: update taskId in state when server issues a new task
- Handle `CancelledEvent`: treat as stream termination (agentStatus → idle)
- MessageType.interrupted: new UI node (warning style) for mid-stream cancels
- _inject(): send text as an inject request while streaming; backend cancels
the current task and starts a new one with the injected message
- Input area: during streaming, hint changes to "追加指令...", Enter key calls
_inject() instead of _send(), and both inject-send + stop buttons are shown
- isAwaitingApproval kept separate from isStreaming so approval flow is not
blocked by inject mode
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously AgentSkillService wrote skills to public.agent_skills (TypeORM
entity with tenantId column filter), while ClaudeAgentSdkEngine read from
it0_t_{tenantId}.skills (per-tenant schema). The two tables were never
connected, so any skill added via the CRUD API was invisible to the agent.
This fix:
- Rewrites AgentSkillService to use DataSource + raw SQL against the
per-tenant schema it0_t_{tenantId}.skills
- Maps API fields: script→content, enabled→is_active
- Removes AgentSkillRepository and AgentSkill entity from module (no longer needed)
- CRUD API response shape is unchanged (fields mapped back to script/enabled)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Load active skills from the tenant's schema `skills` table and append
them to the system prompt before passing to the Claude Agent SDK. This
closes the gap where skills existed in the DB but were never surfaced
to the agent during task execution.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove iproute2/NET_ADMIN (no longer needed)
- Remove ip route hack from entrypoint.sh
- rwa-colocation-2 server record updated to use Docker gateway IP
since 14.215.128.96 is a host-local NIC on the IT0 server
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
14.215.128.96 is bound to a host NIC (enp5s0) and unreachable from
Docker bridge via default NAT. Add NET_ADMIN + ip route via gateway.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The IT0 server has its own id_ed25519 which differs from the local
key that's authorized on RWADurian servers. Use a dedicated key file.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>