Commit Graph

326 Commits

Author SHA1 Message Date
hailin e32a3a9800 fix: use @TenantId() decorator in VoiceConfigController for JWT tenant extraction
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 22:30:37 -08:00
hailin f9c47de04b feat: add STT provider switching (OpenAI ↔ Speechmatics) in settings
- Add VoiceConfig entity/repo/service/controller in agent-service
  for per-tenant STT provider persistence (default: speechmatics)
- Add Speechmatics STT plugin in voice-agent with livekit-plugins-speechmatics
- Modify voice-agent entrypoint for 3-way STT selection:
  metadata > agent-service config > env var fallback
- Add "Voice" section in web-admin settings page with STT provider dropdown
- Add i18n translations (en/zh) for voice settings
- Add SPEECHMATICS_API_KEY env var in docker-compose

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 22:13:18 -08:00
hailin 7cb185e0cd fix: remove RunbookExecution data wrapper type in runbook detail page
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 18:50:38 -08:00
hailin bf68ceccbc fix: remove PaginatedResponse wrapper in communication page
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 18:48:17 -08:00
hailin 07e6c5671d fix: resolve remaining .total and .data references after response format migration
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 18:29:59 -08:00
hailin ee9383d301 fix: audit logs page - use array length instead of .total property
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 18:22:44 -08:00
hailin 146d271dc3 fix: convert Response wrapper interfaces to direct array types
Backend APIs return arrays directly, not { data, total } wrappers.
Changed 21 interface declarations to type aliases matching actual
API response format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 09:54:33 -08:00
hailin 10e0b0ce29 fix: remove incorrect .data wrapper — backend returns arrays directly
All pages expected API responses in { data: [], total } format but
backend APIs return plain arrays. Changed data?.data ?? [] to data ?? []
across 22 page components.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 09:47:22 -08:00
hailin d21f41d7c3 fix: auto-redirect to login on 401 Unauthorized
When API returns 401, clear stored tokens and redirect to /login
instead of showing an error message.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 09:40:58 -08:00
hailin 375c5632f6 fix: correct all web-admin API endpoint URLs to match backend routes
The web-admin frontend was calling incorrect API paths that didn't match
the actual backend service routes through Kong gateway, causing all
requests to fail with 404 or route-mismatch errors.

URL corrections:
- servers: /api/v1/servers → /api/v1/inventory/servers
- runbooks: /api/v1/runbooks → /api/v1/ops/runbooks
- risk-rules: /api/v1/security/risk-rules → /api/v1/agent/risk-rules
- credentials: /api/v1/security/credentials → /api/v1/inventory/credentials
- roles: /api/v1/security/roles → /api/v1/auth/roles
- permissions: /api/v1/security/permissions → /api/v1/auth/permissions
- tenants: /api/v1/tenants → /api/v1/admin/tenants
- communication: /api/v1/communication → /api/v1/comm

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 09:29:51 -08:00
hailin 94e3153e39 chore: remove debug data_received logging 2026-03-02 06:50:41 -08:00
hailin 81e36bf859 debug: add data_received event logging to diagnose data channel 2026-03-02 06:38:02 -08:00
hailin 63b986fced fix: redesign voice call mixed-mode input with dual-layout architecture
Problem:
- Text input area caused BOTTOM OVERFLOWED BY 135 PIXELS when keyboard opened
- Input bar overlapped with call control buttons
- Sent messages were not displayed on screen (only SnackBar feedback)

Solution — split into two distinct layouts:

1. Call Mode (default):
   - Full-screen call UI: avatar, waveform, duration, large control buttons
   - Keyboard button in controls toggles to chat mode
   - No text input elements — clean voice-only interface

2. Chat Mode (tap keyboard button):
   - Compact call header: green status dot + "iAgent" + duration + inline
     mute/end/speaker/collapse controls
   - Scrollable message list (Expanded widget — properly handles keyboard)
   - User messages: right-aligned blue bubbles with attachment thumbnails
   - Agent responses: left-aligned gray bubbles with robot avatar
   - Input bar at bottom: attachment picker + text field + send button

Message display:
- User-sent text/attachments tracked in _messages list, shown as bubbles
- Agent responses sent back via LiveKit data channel (topic='text_reply')
  from voice-agent → Flutter, displayed as assistant bubbles
- Auto-scroll to latest message

Voice-agent change (agent.py):
- After session.say(response), publish response text back to Flutter via
  ctx.room.local_participant.publish_data() with topic='text_reply'
- Flutter listens for DataReceivedEvent to display agent responses

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 06:11:07 -08:00
hailin ce63ece340 feat: add mixed-mode input (text + images + files) during voice calls
Enable users to send text messages, images, and files to the Agent
while an active voice call is in progress. This addresses the case
where spoken instructions are unclear or screenshots/documents need
to be shared for analysis.

## Architecture

Data flows through LiveKit data channel (not direct HTTP):
  Flutter → publishData(topic='text_inject') → voice-agent
  → llm.inject_text_message() → POST /api/v1/agent/tasks (same session)
  → collect streamed response → session.say() → TTS playback

This preserves the constraint that voice-agent owns the agent-service
sessionId — Flutter never contacts agent-service directly.

## Flutter UI (agent_call_page.dart)
- Add keyboard toggle button to active call controls (4-button row)
- Collapsible text input area with attachment picker (+) and send button
- Attachment support: gallery multi-select, camera, file picker
  (images max 1024x1024 quality 80%, PDF supported, max 5 attachments)
- Horizontal scrolling attachment preview with delete buttons
- 200KB payload size check before LiveKit data channel send
- Layout adapts: Spacer flex 1/3 toggle, reduced bottom padding

## voice-agent (agent.py)
- Register data_received event listener after session.start()
- Filter for topic='text_inject', parse JSON payload
- Call llm.inject_text_message(text, attachments) and TTS via session.say()
- Use asyncio.ensure_future() wrapper for async handler (matches
  existing disconnect handler pattern for sync EventEmitter)

## AgentServiceLLM (agent_llm.py)
- New inject_text_message(text, attachments) method on AgentServiceLLM
- Reuses same _agent_session_id for conversation context continuity
- WS+HTTP streaming: connect, pre-subscribe, POST /tasks with
  attachments field, collect full text response, return string
- _injecting flag prevents concurrent _do_stream from clearing
  session ID on abort errors while inject is in progress
- Same systemPrompt/voiceMode/engineType as voice pipeline

No agent-service changes required — attachments already supported
end-to-end (JSONB storage → multimodal content blocks → Claude).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 05:38:04 -08:00
hailin 02aaf40bb2 fix: move voice instructions to systemPrompt, keep prompt clean
Previously, voice mode wrapped every user message with 【语音对话模式】
instructions, polluting conversation_messages history with repeated
instructions on every turn. Now:

- systemPrompt carries voice-mode instructions (set once, not per-message)
- prompt contains only the clean user text (identical to text chat pattern)
- Conversation history stays clean for multi-turn context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 03:24:50 -08:00
hailin da17488389 feat: voice mode event filtering — skip tool/thinking events for Agent SDK
1. Remove on_enter greeting entirely (no more race condition)
2. voice-agent sends voiceMode: true when engine_type is claude_agent_sdk
3. AgentController.runTaskStream() filters thinking, tool_use, tool_result
   events in voice mode — only text, completed, error reach the client
4. Detailed logging: each event logged with [FILTERED-voice] tag when skipped

Claude API mode is completely unaffected (voiceMode defaults to false).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 02:56:41 -08:00
hailin 7c9fabd891 fix: avoid Agent SDK race on greeting + clear session on abort
1. Change on_enter greeting from generate_reply() to session.say() with
   a static message — avoids spawning an Agent SDK task just for a greeting,
   which caused a race condition when the user speaks before it completes.

2. Clear agent session ID when receiving abort/exit errors so the next
   task starts a fresh session instead of trying to resume a dead process.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 02:22:52 -08:00
hailin a78e2cd923 chore: add detailed engine type logging for verification
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 02:18:29 -08:00
hailin 59a3e60b82 feat: add engine type selection (Agent SDK / Claude API) for voice calls
Full-stack implementation allowing users to choose between Claude Agent SDK
(default, with tool approval, skill injection, session resume) and Claude API
(direct, lower latency) in Flutter settings. Agent SDK mode wraps prompts with
voice-conversation instructions for concise spoken Chinese output.

Data flow: Flutter Settings → SharedPreferences → POST /livekit/token →
RoomAgentDispatch metadata → voice-agent → AgentServiceLLM(engine_type)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 02:11:51 -08:00
hailin c9e196639a fix: add missing subscribe parameter to Timeouts constructor
All 6 Timeouts parameters are required in livekit_client 2.6.4.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 23:52:49 -08:00
hailin 3fd27ff190 fix: add required debounce parameter to Timeouts constructor
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 23:42:54 -08:00
hailin e66c187353 fix: improve voice pipeline robustness for poor network conditions
Flutter (agent_call_page.dart):
- Add ConnectOptions with 15s timeouts for connection/peerConnection/iceRestart
- Add RoomReconnectingEvent/RoomAttemptReconnectEvent/RoomReconnectedEvent
  listeners with "网络重连中" UI indicator during reconnection
- Add TimeoutException detection in _friendlyError()

voice-agent (agent.py):
- Wrap entrypoint() in try-except with full traceback logging
- Register room disconnect listener to close httpx clients (instead of
  finally block, since session.start() returns while session runs in bg)
- Add asyncio import for ensure_future cleanup

voice-agent LLM proxy (agent_llm.py):
- Add retry with exponential backoff (max 2 retries, 1s/3s delays) for
  network errors (ConnectError/ConnectTimeout/OSError) and WS InvalidStatusCode
- Extract _do_stream() method for single-attempt logic
- Add WebSocket connection params: open_timeout=10, ping_interval=20,
  ping_timeout=10 for keepalive and faster dead-connection detection
- Use granular httpx.Timeout(connect=10, read=30, write=10, pool=10)
- Increase WS recv timeout from 5s to 30s to reduce unnecessary loops

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 23:34:55 -08:00
hailin 32922c6819 fix: adjust TTS default instructions for faster speech tempo
Changed from "语速适中" to "语速稍快,简洁干练" to reduce perceived
latency in voice conversations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 22:09:32 -08:00
hailin fb236de6e4 fix: set LiveKit node_ip to China IP for domestic WebRTC connectivity
LiveKit's use_external_ip auto-detected 154.84.135.121 (overseas) via
STUN, causing WebRTC ICE candidates to use an unreachable IP for
domestic mobile clients. Explicitly set node_ip to 14.215.128.96.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 21:51:17 -08:00
hailin 8a48e92970 fix: use domain names for API access, China IP for LiveKit
Flutter app now uses https://it0api.szaiai.com (nginx reverse proxy)
instead of direct IP:port. LiveKit URL uses China IP 14.215.128.96
for lower latency from domestic mobile clients.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 21:44:25 -08:00
hailin 7fb0168dc5 fix: keep voice-service on bridge networking to avoid port conflict
iconsulting-llm-gateway already occupies port 3008 on the host.
voice-service only has a single TCP port (no docker-proxy overhead),
so bridge networking with 13008:3008 mapping is sufficient.
Only livekit-server and voice-agent need host mode (UDP port ranges).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 20:23:13 -08:00
hailin 68ee2516d5 fix: use host networking for voice services to eliminate docker-proxy overhead
Bridge mode created 600+ docker-proxy processes for LiveKit's UDP port-range
mappings (30000-30100, 50000-50200). Switch livekit-server, voice-agent, and
voice-service to network_mode: host for zero-overhead networking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 19:58:32 -08:00
hailin 186234bae2 fix: increase STT silence_duration_ms to prevent choppy transcription
Default silence_duration_ms=350 is too aggressive for Chinese speech,
causing sentences to be fragmented into 1-3 character chunks. Increase
to 800ms and raise VAD threshold to 0.6 so the STT waits longer before
finalizing a turn, producing complete sentences for LLM processing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 18:37:13 -08:00
hailin a5c95b460a fix: patch aiohttp SSL verification for OpenAI Realtime STT WebSocket
The OpenAI Realtime STT uses aiohttp WebSocket connections (not httpx),
so the existing httpx verify=False fix does not apply. LiveKit's
http_context creates aiohttp.TCPConnector without ssl=False, causing
SSL certificate verification errors when OPENAI_BASE_URL points to a
proxy with a self-signed certificate.

Monkey-patch http_context._new_session_ctx to inject ssl=False into the
aiohttp connector, fixing the "CERTIFICATE_VERIFY_FAILED" error for
Realtime STT WebSocket connections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 18:29:59 -08:00
hailin 5460be8c04 feat: add TTS voice and style settings to Flutter app
Add user-configurable TTS voice and tone style settings that flow from
the Flutter app through the backend to the voice-agent at call time.

## Flutter App (it0_app)

### Domain Layer
- app_settings.dart: Add `ttsVoice` (default: 'coral') and `ttsStyle`
  (default: '') fields to AppSettings entity with copyWith support

### Data Layer
- settings_datasource.dart: Add SharedPreferences keys
  `settings_tts_voice` and `settings_tts_style` for local persistence
  in loadSettings(), saveSettings(), and clearSettings()

### Presentation Layer
- settings_providers.dart: Add `setTtsVoice()` and `setTtsStyle()`
  methods to SettingsNotifier for Riverpod state management
- settings_page.dart: Add "语音" settings group between Notifications
  and Security groups with:
  - Voice picker: 13 OpenAI voices with gender/style labels
    (e.g. "女 · 温暖", "男 · 沉稳", "中性") in a BottomSheet
  - Style picker: 5 presets (专业干练/温柔耐心/轻松活泼/严肃正式/科幻AI)
    as ChoiceChips + custom text input field + reset button

### Call Flow
- agent_call_page.dart: Send `tts_voice` and `tts_style` in the POST
  body when requesting a LiveKit token at call initiation

## Backend

### voice-service (Python/FastAPI)
- livekit_token.py: Accept optional `tts_voice` and `tts_style` via
  Pydantic TokenRequest body model; embed them in RoomAgentDispatch
  metadata JSON alongside auth_header (backward compatible)

### voice-agent (Python/LiveKit Agents)
- agent.py: Extract `tts_voice` and `tts_style` from ctx.job.metadata;
  use them when creating openai_plugin.TTS() — user-selected voice
  overrides config default, user-selected style overrides default
  instructions. Falls back to config defaults when not provided.

## Data Flow
Flutter Settings → SharedPreferences → POST /livekit/token body →
voice-service embeds in RoomAgentDispatch metadata →
voice-agent reads from ctx.job.metadata → TTS creation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 09:38:15 -08:00
hailin 2dc361f7a0 chore: update docker-compose TTS defaults to gpt-4o-mini-tts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 08:44:17 -08:00
hailin 705647d732 feat: upgrade TTS to gpt-4o-mini-tts with voice instructions
- Switch from tts-1 to gpt-4o-mini-tts for lower latency and better quality
- Change voice from alloy to coral
- Add Chinese speech instructions for natural tone control

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 08:19:05 -08:00
hailin ba83e433d3 feat: enable OpenAI Realtime STT for streaming speech recognition
Switch from batch STT (gpt-4o-transcribe via /audio/transcriptions)
to streaming Realtime API (WebSocket). This eliminates the ~2s batch
upload+process latency per utterance.

Also updated nginx proxy on 67.223.119.33 to support WebSocket upgrade
for /v1/realtime endpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 07:49:25 -08:00
hailin e302891f16 fix: disable SSL verify for self-signed OpenAI proxy + handle no-user-msg
- Pass httpx.AsyncClient(verify=False) to OpenAI STT/TTS to support
  self-signed certificate on OPENAI_BASE_URL proxy
- Handle generate_reply calls with no user message by falling back to
  system/developer instructions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 21:39:49 -08:00
hailin 4d47c6a955 fix: remove wait_for_participant — room not connected in rtc_session mode
In livekit-agents v1.x @server.rtc_session() pattern, ctx.room is not
yet connected when entrypoint is called. session.start() handles room
connection internally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 21:15:37 -08:00
hailin 2112445191 fix: voice-agent crash — add room I/O options and filter AgentConfigUpdate
- Add room_input_options/room_output_options to session.start() so agent
  binds audio I/O and stays in the room
- Add wait_for_participant() before starting session
- Filter AgentConfigUpdate items in agent_llm.py (no 'role' attribute)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 21:08:07 -08:00
hailin 00be878a95 fix: refactor voice-agent to official LiveKit v1.x AgentServer pattern
Replace deprecated WorkerOptions(entrypoint_fnc=...) with AgentServer() +
@server.rtc_session() decorator. Use server.setup_fnc for prewarm. Remove
manual ctx.connect() and ctx.wait_for_participant() calls that prevented
the pipeline from properly wiring up VAD→STT→LLM→TTS.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 12:31:31 -08:00
hailin cf60b8733f fix: expose TURN relay ports for NAT traversal
Limit TURN relay range to 30000-30100 and expose via docker-compose.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 11:39:50 -08:00
hailin 2f0cb13ecb fix: enable built-in TURN server for NAT traversal
Subscriber transport was timing out on DTLS handshake for clients
behind complex NAT (VPN/symmetric NAT). Enable LiveKit's built-in
TURN server on UDP port 3478.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 11:37:21 -08:00
hailin 75b14d5200 fix: use RoomOptions instead of deprecated RoomInputOptions
RoomInputOptions is deprecated in livekit-agents 1.4.x. Switch to
RoomOptions with explicit audio_input/audio_output enabled.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 11:32:36 -08:00
hailin 46a2d06be3 fix: implement speaker/earpiece toggle on voice call page
Use Hardware.instance.setSpeakerphoneOn() to switch between speaker
and earpiece modes. Default to speaker on.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 11:11:29 -08:00
hailin 23b5bce983 fix: extract auth header from job.metadata instead of agent_dispatch
LiveKit passes RoomAgentDispatch metadata through as job.metadata
(protobuf field), not via a separate agent_dispatch object. Also
use room_io.RoomInputOptions for participant targeting (livekit-agents 1.x).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 11:04:02 -08:00
hailin f1d50e43f1 fix: update AgentSession.start() for livekit-agents 1.x API
livekit-agents 1.x removed the 'participant' parameter from
AgentSession.start(). Use room_input_options with participant_identity
instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 10:31:04 -08:00
hailin 19efeec26d fix: remove unsupported audioBitrate param from AudioPublishOptions
livekit_client 2.6.4 no longer has audioBitrate parameter.
Default AudioPublishOptions auto-selects optimal speech bitrate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 10:13:11 -08:00
hailin 2ce0e7cdd4 fix: use external LiveKit URL in voice-service config
The livekit_ws_url returned in token response needs to be the external
server address, not the internal Docker network name, so Flutter clients
can connect directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 10:00:26 -08:00
hailin acfdae7773 fix: use livekit-api package for voice-service token endpoint
The livekit package is the client SDK and doesn't include the server-side
API module. Switch to livekit-api which provides AccessToken, VideoGrants,
RoomAgentDispatch, and RoomConfiguration needed for token generation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 09:49:11 -08:00
hailin 112c445143 fix: resolve websockets version conflict and use CPU-only torch
- Upgrade websockets from ==12.0 to >=13.0 (openai[realtime] requires >=13)
- Install torch CPU-only build separately in Dockerfile to avoid ~2GB CUDA download
- Remove torch from requirements.txt (installed via --index-url cpu wheel)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 09:02:31 -08:00
hailin 94a14b3104 feat: migrate voice call from WebSocket/PCM to LiveKit WebRTC
实时语音对话架构迁移:WebSocket → LiveKit WebRTC

## 背景
原语音通话架构基于 FastAPI WebSocket 传输原始 PCM,管道串行执行
(VAD → 批量STT → Agent → 攒句 → 批量TTS),首音频延迟约 6 秒。
迁移到 LiveKit Agents 框架后,利用 WebRTC 传输 + 流水线并行,
预期延迟降至 1.5-2 秒。

## 架构
Flutter App ←── WebRTC (Opus/UDP) ──→ LiveKit Server ←──→ Voice Agent
  livekit_client                      (自部署, Go)       (Python, LiveKit Agents SDK)
                                                          ├─ VAD (Silero)
                                                          ├─ STT (faster-whisper / OpenAI)
                                                          ├─ LLM (自定义插件 → agent-service)
                                                          └─ TTS (Kokoro / OpenAI)

关键设计:LLM 不直接调用 Claude API,而是通过自定义插件代理到现有
agent-service,保留 Tool Use、会话历史、租户隔离等能力。

## 新增服务

### voice-agent (packages/services/voice-agent/)
LiveKit Agent Worker,包含:
- agent.py: 入口,prewarm() 预加载模型,entrypoint() 编排会话
- plugins/agent_llm.py: 自定义 LLM 插件,代理 agent-service API
  - POST /api/v1/agent/tasks 创建任务
  - WS /ws/agent 订阅流式事件 (stream_event)
  - 跨轮复用 session_id 保持对话上下文
- plugins/whisper_stt.py: 本地 faster-whisper STT (批量识别)
- plugins/kokoro_tts.py: 本地 Kokoro-82M TTS (24kHz PCM)
- config.py: pydantic-settings 配置

### LiveKit Server (deploy/docker/)
- livekit.yaml: 信令端口 7880, RTC TCP 7881, UDP 50000-50200
- docker-compose.yml: 新增 livekit-server + voice-agent 容器

### LiveKit Token 端点
- voice-service/src/api/livekit_token.py:
  POST /api/v1/voice/livekit/token
  生成 Room JWT,嵌入 auth_header 到 AgentDispatch metadata

## Flutter 客户端改造
- agent_call_page.dart: 从 ~814 行简化到 ~380 行
  - 替换: WebSocketChannel, AudioRecorder, PcmPlayer, 手动心跳/重连
  - 使用: Room.connect(), setMicrophoneEnabled(true), LiveKit 事件监听
  - 波形动画改用 participant.audioLevel
- pubspec.yaml: 添加 livekit_client: ^2.3.0
- app_config.dart: 增加 livekitUrl 字段
- api_endpoints.dart: 增加 livekitToken 端点

## 配置说明 (环境变量)
- STT_PROVIDER: local (默认, faster-whisper) / openai
- TTS_PROVIDER: local (默认, Kokoro) / openai
- WHISPER_MODEL: base (默认) / small / medium / large
- WHISPER_LANGUAGE: zh (默认)
- KOKORO_VOICE: zf_xiaoxiao (默认)
- DEVICE: cpu (默认) / cuda

## 不变的部分
- agent-service: 完全不改,voice-agent 通过现有 API 调用
- voice-service 核心: pipeline/STT/TTS/VAD 保留 (Twilio 备用)
- Kong 网关: 现有路由不变
- 数据库: 无 schema 变更

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 08:55:33 -08:00
hailin 7e44ddc358 fix: file picker now shows subdirectories on Android
FileType.custom with allowedExtensions causes Android system picker
to hide subdirectories on some devices. Changed to FileType.any with
post-selection extension validation instead.

- Unsupported file types are skipped with a SnackBar hint
- Allowed: jpg, jpeg, png, gif, webp, pdf

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 06:02:47 -08:00
hailin 4987cad881 fix: increase body parser limit to 50mb for large PDF uploads
Claude API supports up to 32MB PDFs; base64 encoding adds ~33% overhead.
50mb body limit covers the maximum single-document upload case.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 05:35:43 -08:00