Commit Graph

35 Commits

Author SHA1 Message Date
hailin 64499a5d86 feat(dingtalk): 小龙虾招募全语音/文字引导流程 + OAuth 一键授权卡片
## 功能说明
用户通过语音或文字说「帮我招募一只小龙虾」,iAgent 全程引导完成
OpenClaw 实例创建 + 钉钉 OAuth 一键授权绑定。

## 核心设计
- 语音场景 (claude_agent_sdk): Claude 通过 Bash/wget 调用内部 HTTP
  端点触发 OAuth,绕开 ToolExecutor 限制,两引擎均兼容
- 文字场景 (claude_api): 使用 initiate_dingtalk_binding 自定义工具,
  通过 uiEvent 机制传递 OAuth URL

## agent-service 变更
- agent-engine.port.ts: EngineStreamEvent 联合类型新增 oauth_prompt
- allowed-tools-resolver.service.ts: initiate_dingtalk_binding 加入
  ALL_SDK_TOOLS / admin / operator 工具白名单
- tool-executor.ts: 新增 executeInitiateDingTalkBinding(),调用内部
  oauth/init 端点获取 OAuth URL,返回 uiEvent
- claude-api-engine.ts: 在 tool_result 之后检查 result.uiEvent 并
  yield 出去;buildToolDefinitions 注册 initiate_dingtalk_binding schema
- system-prompt-builder.ts:
  - SystemPromptContext 新增 sessionId? 字段
  - 语音 session (sessionId 存在) → Step 3 使用 wget 调用
    POST /sessions/{sessionId}/dingtalk/oauth-trigger(两引擎通用)
  - 文字 session (无 sessionId) → Step 3 调用 initiate_dingtalk_binding
    工具(claude_api 专用)
- voice-session.controller.ts:
  - 注入 AgentStreamGateway / DingTalkRouterService / AgentInstanceRepository
  - startVoiceSession: 提前确定 sessionId,在 build() 前传入,使系统
    提示能内嵌正确的端点 URL
  - 新增 POST :sessionId/dingtalk/oauth-trigger — 无 JWT(内部端点,
    由 Claude Bash 工具调用),sessionId 作为能力令牌;生成 OAuth URL
    并通过 gateway.emitStreamEvent 直接推送 oauth_prompt 事件到 WS 流

## voice-agent 变更
- agent.py: 构造 AgentServiceLLM 时传入 room=ctx.room
- agent_llm.py:
  - __init__ 增加 room 参数,存储为 self._room
  - 新增 _publish_oauth_prompt(evt_data): null-safe,通过 LiveKit
    publish_data(topic="oauth_prompt") 推送到 Flutter
  - _do_inject_voice / _do_inject / _do_stream_voice / _do_stream:
    处理 oauth_prompt 事件 → asyncio.create_task(_publish_oauth_prompt)
  - 替换已弃用的 asyncio.ensure_future / get_event_loop().create_task
    → asyncio.create_task(Python 3.10+ 兼容)

## Flutter 变更
- agent_call_page.dart: DataReceivedEvent 监听 topic="oauth_prompt",
  解析 url/instanceName,弹出 _showOAuthBottomSheet(深色主题,🦞
  图标,「立即授权」按钮 launchUrl externalApplication)
- stream_event.dart: 新增 OAuthPromptEvent(url, instanceId, instanceName)
- stream_event_model.dart: toEntity() 新增 'oauth_prompt' case
- chat_message.dart: MessageType 枚举新增 oauthPrompt
- chat_providers.dart: _handleStreamEvent 新增 OAuthPromptEvent case,
  生成 type=oauthPrompt 的 ChatMessage(metadata 含 url/instanceName)
- chat_page.dart: 新增 oauthPrompt 时间线节点 + _OAuthPromptCard 组件
  (「立即授权」按钮,launchUrl externalApplication);import url_launcher

## 修复的关键 Bug
1. [严重] initiate_dingtalk_binding 只对 claude_api 有效,语音默认用
   claude_agent_sdk → 新 wget 端点两引擎均可用
2. [严重] 文字聊天页面不处理 oauth_prompt 事件(静默丢弃)→ 补全
   Flutter 4 处代码(entity/model/provider/page)
3. [中]   _publish_oauth_prompt 缺 local_participant null 检查 → 已修复
4. [轻]   asyncio.ensure_future / get_event_loop() 弃用警告 → 已修复

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 11:22:06 -07:00
hailin 6be84617d2 feat(flutter): i18n体系(zh/zh_TW/en) + 智能体解聘功能
- 建立完整 flutter_localizations i18n 体系:zh/zh_TW/en 三语言
- l10n.yaml + ARB 文件 (app_zh.arb 约120键作模板,zh_TW/en 对应覆盖)
- localeProvider 连接 SharedPreferences language 设置,实时切换语言
- 设置页加入语言选择器(简体中文/繁体中文/English)
- 我的智能体页实现解聘(解聘确认弹窗 + DELETE API)与重命名功能
- 全部页面 (~18个) UI 字符串替换为 AppLocalizations.of(context).xxx

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 00:05:55 -08:00
hailin 38594d6fd4 feat(flutter): rename iAgent→我智能体,创建/删除→招募,拉近人机关系距离
- App title、登录页、导航Tab、通话页等全局将 iAgent 改为 我智能体
- 底部导航 Tab "我的创建" → "我的智能体"
- 智能体语境下 "创建" → "招募":招募你的专属智能体、帮我招募一个...
- tasks_page 空状态文案 "创建" → "新增"(非智能体语境保持语义准确)
- 终端欢迎语、通知渠道描述同步更新

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 23:10:07 -08:00
hailin 5721d75461 feat(it0_app): add PTT mode to agent call page
- Default to PTT (push-to-talk) on call connect: mic muted until user holds button
- Toggle switch between PTT and free voice mode in active call controls
- PTT button: press-and-hold unmutes mic, release mutes again
- Voice message bubble (waveform + duration) appears after each PTT send
- Mute button hidden in PTT mode (mic controlled by PTT button)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 06:49:53 -08:00
hailin 26369be760 docs: add detailed comments for thinking state indicator mechanism
voice-agent agent.py:
- Module docstring explains lk.agent.state lifecycle
  (initializing → listening → thinking → speaking)
- Explains how RoomIO publishes state as participant attribute
- Documents BackgroundAudioPlayer with all available built-in clips

Flutter agent_call_page.dart:
- Documents _agentState field and all possible values
- Documents ParticipantAttributesChanged listener with UI mapping

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 06:13:54 -08:00
hailin 33bd1aa3aa feat: add "thinking" state indicator for voice calls
- voice-agent: enable BackgroundAudioPlayer with keyboard typing sound
  during LLM thinking state (auto-plays when agent enters "thinking",
  stops when "speaking" starts)
- Flutter: monitor lk.agent.state participant attribute from LiveKit
  agent, show pulsing dots animation + "思考中..." text when thinking,
  avatar border changes to warning color with pulsing glow ring
- Both call mode and chat mode headers show thinking state

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 05:45:04 -08:00
hailin 63b986fced fix: redesign voice call mixed-mode input with dual-layout architecture
Problem:
- Text input area caused BOTTOM OVERFLOWED BY 135 PIXELS when keyboard opened
- Input bar overlapped with call control buttons
- Sent messages were not displayed on screen (only SnackBar feedback)

Solution — split into two distinct layouts:

1. Call Mode (default):
   - Full-screen call UI: avatar, waveform, duration, large control buttons
   - Keyboard button in controls toggles to chat mode
   - No text input elements — clean voice-only interface

2. Chat Mode (tap keyboard button):
   - Compact call header: green status dot + "iAgent" + duration + inline
     mute/end/speaker/collapse controls
   - Scrollable message list (Expanded widget — properly handles keyboard)
   - User messages: right-aligned blue bubbles with attachment thumbnails
   - Agent responses: left-aligned gray bubbles with robot avatar
   - Input bar at bottom: attachment picker + text field + send button

Message display:
- User-sent text/attachments tracked in _messages list, shown as bubbles
- Agent responses sent back via LiveKit data channel (topic='text_reply')
  from voice-agent → Flutter, displayed as assistant bubbles
- Auto-scroll to latest message

Voice-agent change (agent.py):
- After session.say(response), publish response text back to Flutter via
  ctx.room.local_participant.publish_data() with topic='text_reply'
- Flutter listens for DataReceivedEvent to display agent responses

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 06:11:07 -08:00
hailin ce63ece340 feat: add mixed-mode input (text + images + files) during voice calls
Enable users to send text messages, images, and files to the Agent
while an active voice call is in progress. This addresses the case
where spoken instructions are unclear or screenshots/documents need
to be shared for analysis.

## Architecture

Data flows through LiveKit data channel (not direct HTTP):
  Flutter → publishData(topic='text_inject') → voice-agent
  → llm.inject_text_message() → POST /api/v1/agent/tasks (same session)
  → collect streamed response → session.say() → TTS playback

This preserves the constraint that voice-agent owns the agent-service
sessionId — Flutter never contacts agent-service directly.

## Flutter UI (agent_call_page.dart)
- Add keyboard toggle button to active call controls (4-button row)
- Collapsible text input area with attachment picker (+) and send button
- Attachment support: gallery multi-select, camera, file picker
  (images max 1024x1024 quality 80%, PDF supported, max 5 attachments)
- Horizontal scrolling attachment preview with delete buttons
- 200KB payload size check before LiveKit data channel send
- Layout adapts: Spacer flex 1/3 toggle, reduced bottom padding

## voice-agent (agent.py)
- Register data_received event listener after session.start()
- Filter for topic='text_inject', parse JSON payload
- Call llm.inject_text_message(text, attachments) and TTS via session.say()
- Use asyncio.ensure_future() wrapper for async handler (matches
  existing disconnect handler pattern for sync EventEmitter)

## AgentServiceLLM (agent_llm.py)
- New inject_text_message(text, attachments) method on AgentServiceLLM
- Reuses same _agent_session_id for conversation context continuity
- WS+HTTP streaming: connect, pre-subscribe, POST /tasks with
  attachments field, collect full text response, return string
- _injecting flag prevents concurrent _do_stream from clearing
  session ID on abort errors while inject is in progress
- Same systemPrompt/voiceMode/engineType as voice pipeline

No agent-service changes required — attachments already supported
end-to-end (JSONB storage → multimodal content blocks → Claude).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 05:38:04 -08:00
hailin 59a3e60b82 feat: add engine type selection (Agent SDK / Claude API) for voice calls
Full-stack implementation allowing users to choose between Claude Agent SDK
(default, with tool approval, skill injection, session resume) and Claude API
(direct, lower latency) in Flutter settings. Agent SDK mode wraps prompts with
voice-conversation instructions for concise spoken Chinese output.

Data flow: Flutter Settings → SharedPreferences → POST /livekit/token →
RoomAgentDispatch metadata → voice-agent → AgentServiceLLM(engine_type)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 02:11:51 -08:00
hailin c9e196639a fix: add missing subscribe parameter to Timeouts constructor
All 6 Timeouts parameters are required in livekit_client 2.6.4.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 23:52:49 -08:00
hailin 3fd27ff190 fix: add required debounce parameter to Timeouts constructor
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 23:42:54 -08:00
hailin e66c187353 fix: improve voice pipeline robustness for poor network conditions
Flutter (agent_call_page.dart):
- Add ConnectOptions with 15s timeouts for connection/peerConnection/iceRestart
- Add RoomReconnectingEvent/RoomAttemptReconnectEvent/RoomReconnectedEvent
  listeners with "网络重连中" UI indicator during reconnection
- Add TimeoutException detection in _friendlyError()

voice-agent (agent.py):
- Wrap entrypoint() in try-except with full traceback logging
- Register room disconnect listener to close httpx clients (instead of
  finally block, since session.start() returns while session runs in bg)
- Add asyncio import for ensure_future cleanup

voice-agent LLM proxy (agent_llm.py):
- Add retry with exponential backoff (max 2 retries, 1s/3s delays) for
  network errors (ConnectError/ConnectTimeout/OSError) and WS InvalidStatusCode
- Extract _do_stream() method for single-attempt logic
- Add WebSocket connection params: open_timeout=10, ping_interval=20,
  ping_timeout=10 for keepalive and faster dead-connection detection
- Use granular httpx.Timeout(connect=10, read=30, write=10, pool=10)
- Increase WS recv timeout from 5s to 30s to reduce unnecessary loops

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 23:34:55 -08:00
hailin 5460be8c04 feat: add TTS voice and style settings to Flutter app
Add user-configurable TTS voice and tone style settings that flow from
the Flutter app through the backend to the voice-agent at call time.

## Flutter App (it0_app)

### Domain Layer
- app_settings.dart: Add `ttsVoice` (default: 'coral') and `ttsStyle`
  (default: '') fields to AppSettings entity with copyWith support

### Data Layer
- settings_datasource.dart: Add SharedPreferences keys
  `settings_tts_voice` and `settings_tts_style` for local persistence
  in loadSettings(), saveSettings(), and clearSettings()

### Presentation Layer
- settings_providers.dart: Add `setTtsVoice()` and `setTtsStyle()`
  methods to SettingsNotifier for Riverpod state management
- settings_page.dart: Add "语音" settings group between Notifications
  and Security groups with:
  - Voice picker: 13 OpenAI voices with gender/style labels
    (e.g. "女 · 温暖", "男 · 沉稳", "中性") in a BottomSheet
  - Style picker: 5 presets (专业干练/温柔耐心/轻松活泼/严肃正式/科幻AI)
    as ChoiceChips + custom text input field + reset button

### Call Flow
- agent_call_page.dart: Send `tts_voice` and `tts_style` in the POST
  body when requesting a LiveKit token at call initiation

## Backend

### voice-service (Python/FastAPI)
- livekit_token.py: Accept optional `tts_voice` and `tts_style` via
  Pydantic TokenRequest body model; embed them in RoomAgentDispatch
  metadata JSON alongside auth_header (backward compatible)

### voice-agent (Python/LiveKit Agents)
- agent.py: Extract `tts_voice` and `tts_style` from ctx.job.metadata;
  use them when creating openai_plugin.TTS() — user-selected voice
  overrides config default, user-selected style overrides default
  instructions. Falls back to config defaults when not provided.

## Data Flow
Flutter Settings → SharedPreferences → POST /livekit/token body →
voice-service embeds in RoomAgentDispatch metadata →
voice-agent reads from ctx.job.metadata → TTS creation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 09:38:15 -08:00
hailin 46a2d06be3 fix: implement speaker/earpiece toggle on voice call page
Use Hardware.instance.setSpeakerphoneOn() to switch between speaker
and earpiece modes. Default to speaker on.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 11:11:29 -08:00
hailin 19efeec26d fix: remove unsupported audioBitrate param from AudioPublishOptions
livekit_client 2.6.4 no longer has audioBitrate parameter.
Default AudioPublishOptions auto-selects optimal speech bitrate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 10:13:11 -08:00
hailin 94a14b3104 feat: migrate voice call from WebSocket/PCM to LiveKit WebRTC
实时语音对话架构迁移:WebSocket → LiveKit WebRTC

## 背景
原语音通话架构基于 FastAPI WebSocket 传输原始 PCM,管道串行执行
(VAD → 批量STT → Agent → 攒句 → 批量TTS),首音频延迟约 6 秒。
迁移到 LiveKit Agents 框架后,利用 WebRTC 传输 + 流水线并行,
预期延迟降至 1.5-2 秒。

## 架构
Flutter App ←── WebRTC (Opus/UDP) ──→ LiveKit Server ←──→ Voice Agent
  livekit_client                      (自部署, Go)       (Python, LiveKit Agents SDK)
                                                          ├─ VAD (Silero)
                                                          ├─ STT (faster-whisper / OpenAI)
                                                          ├─ LLM (自定义插件 → agent-service)
                                                          └─ TTS (Kokoro / OpenAI)

关键设计:LLM 不直接调用 Claude API,而是通过自定义插件代理到现有
agent-service,保留 Tool Use、会话历史、租户隔离等能力。

## 新增服务

### voice-agent (packages/services/voice-agent/)
LiveKit Agent Worker,包含:
- agent.py: 入口,prewarm() 预加载模型,entrypoint() 编排会话
- plugins/agent_llm.py: 自定义 LLM 插件,代理 agent-service API
  - POST /api/v1/agent/tasks 创建任务
  - WS /ws/agent 订阅流式事件 (stream_event)
  - 跨轮复用 session_id 保持对话上下文
- plugins/whisper_stt.py: 本地 faster-whisper STT (批量识别)
- plugins/kokoro_tts.py: 本地 Kokoro-82M TTS (24kHz PCM)
- config.py: pydantic-settings 配置

### LiveKit Server (deploy/docker/)
- livekit.yaml: 信令端口 7880, RTC TCP 7881, UDP 50000-50200
- docker-compose.yml: 新增 livekit-server + voice-agent 容器

### LiveKit Token 端点
- voice-service/src/api/livekit_token.py:
  POST /api/v1/voice/livekit/token
  生成 Room JWT,嵌入 auth_header 到 AgentDispatch metadata

## Flutter 客户端改造
- agent_call_page.dart: 从 ~814 行简化到 ~380 行
  - 替换: WebSocketChannel, AudioRecorder, PcmPlayer, 手动心跳/重连
  - 使用: Room.connect(), setMicrophoneEnabled(true), LiveKit 事件监听
  - 波形动画改用 participant.audioLevel
- pubspec.yaml: 添加 livekit_client: ^2.3.0
- app_config.dart: 增加 livekitUrl 字段
- api_endpoints.dart: 增加 livekitToken 端点

## 配置说明 (环境变量)
- STT_PROVIDER: local (默认, faster-whisper) / openai
- TTS_PROVIDER: local (默认, Kokoro) / openai
- WHISPER_MODEL: base (默认) / small / medium / large
- WHISPER_LANGUAGE: zh (默认)
- KOKORO_VOICE: zf_xiaoxiao (默认)
- DEVICE: cpu (默认) / cuda

## 不变的部分
- agent-service: 完全不改,voice-agent 通过现有 API 调用
- voice-service 核心: pipeline/STT/TTS/VAD 保留 (Twilio 备用)
- Kong 网关: 现有路由不变
- 数据库: 无 schema 变更

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 08:55:33 -08:00
hailin bc7e32061a fix: improve voice call reconnection robustness
Server side (session_router.py):
- /reconnect now accepts sessions in "active" state (not just "disconnected")
- When client reconnects to an active session, the old WebSocket/pipeline is
  automatically replaced when the new WebSocket connects
- Only truly terminal states (e.g. "ended") return 409

Flutter side (agent_call_page.dart):
- Distinguish terminal errors (404 session gone, 409 ended) from transient
  errors (network timeout, server unreachable) in reconnect loop
- Terminal errors break immediately instead of wasting retry attempts
- Extract _connectWebSocket() helper for cleaner reconnect flow
- Add DioException handling for proper HTTP status code inspection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 07:33:34 -08:00
hailin e706a4cdc7 fix: enable simultaneous playback + recording in voice call
Root cause: PcmPlayer called openPlayer() without audio session config,
so Android defaulted to earpiece-only mode. When the mic was actively
recording, playback was silently suppressed — the agent's TTS audio was
sent successfully over WebSocket but never reached the speaker.

Changes:

1. PcmPlayer (pcm_player.dart):
   - Added audio_session package for proper audio session management
   - Configure AudioSession with playAndRecord category so mic + speaker
     work simultaneously
   - Set voiceCommunication usage to enable Android hardware AEC (echo
     cancellation) — prevents feedback loops when speaker is active
   - defaultToSpeaker routes output to loudspeaker instead of earpiece
   - Restored setSpeakerOn() method stub (used by UI toggle)

2. AgentCallPage (agent_call_page.dart):
   - Fixed fire-and-forget bug: _pcmPlayer.feed() returns Future but was
     called without await, causing interleaved feedUint8FromStream calls
   - Added _feedChain serializer to guarantee sequential audio feeding

3. Dependencies:
   - Added audio_session package to pubspec.yaml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 06:48:16 -08:00
hailin 4456550393 feat: lazy-load local TTS/STT models on first request
Local /synthesize and /transcribe endpoints now auto-load Kokoro/Whisper
models on first call instead of returning 503 when not pre-loaded at
startup. This allows switching between Local and OpenAI providers in the
Flutter test page without requiring server restart.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 04:38:49 -08:00
hailin 7cda482e49 fix: simplify _dioBinary in voice test page to avoid interceptor conflicts
Remove shared interceptors from the binary Dio instance to prevent
request dedup/retry interceptors from interfering with audio downloads.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 17:57:58 -08:00
hailin f7d39d8544 fix: use theme-aware colors in voice test page for dark mode readability
Replace hardcoded Colors.grey with Theme.of(context).colorScheme for
result containers and status text so they're readable in both light
and dark themes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 09:21:06 -08:00
hailin d43baed3a5 feat: add OpenAI TTS/STT API endpoints for comparison testing
- Add openai package to voice-service requirements
- Add /api/v1/test/tts/synthesize-openai (tts-1/tts-1-hd/gpt-4o-mini-tts)
- Add /api/v1/test/stt/transcribe-openai (gpt-4o-transcribe/whisper-1)
- Add OPENAI_API_KEY and OPENAI_BASE_URL env vars to voice-service
- Flutter test page: SegmentedButton to toggle Local/OpenAI provider
- All endpoints maintain same response format for easy comparison

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 07:20:03 -08:00
hailin ac0b8ee1c6 fix: rewrite voice test page using flutter_sound for both record and play
- Remove record package dependency, use FlutterSoundRecorder instead
- Use permission_handler for microphone permission (already in pubspec)
- Proper temp file path via path_provider
- Cleanup temp files after upload
- Single package (flutter_sound) handles both recording and playback

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 05:41:10 -08:00
hailin d4783a3497 fix: use temp directory path for audio recording instead of empty string
The record package requires a valid file path. Empty string caused
ENOENT (No such file or directory) on Android.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 05:39:07 -08:00
hailin 5d4fd96d43 feat: streaming claude-api engine, engineType override, fix voice test page
- Claude API engine now uses streaming API (messages.stream) for real-time
  text delta output instead of waiting for full response
- Agent controller accepts optional engineType body parameter to allow
  callers (e.g. voice pipeline) to select a specific engine
- Fix voice_test_page.dart compilation error: replace audioplayers (not
  installed) with flutter_sound (already in pubspec.yaml)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 05:30:11 -08:00
hailin 6e832c7615 feat: add voice I/O test page in Flutter settings
- TTS: text input → Kokoro synthesis → audio playback
- STT: long-press record → faster-whisper transcription
- Round-trip: record → STT → TTS → playback
- Added /api/v1/test route to Kong gateway for voice-service
- Accessible from Settings → 语音 I/O 测试

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 05:16:10 -08:00
hailin 7afbd54fce fix: rewrite voice pipeline for direct WebSocket I/O, fix TTS and navigation
Root cause: Pipecat's WebsocketServerTransport creates its own WebSocket
server on (host,port) and expects FrameProcessor subclasses. Our code was
passing a FastAPI WebSocket object as 'host' and using plain STT/TTS/VAD
service classes that aren't FrameProcessors. The pipeline crashed immediately
when receiving audio, causing "disconnects when speaking".

Changes:
- **base_pipeline.py**: Complete rewrite — replaced Pipecat Pipeline with
  direct async loop: WebSocket → VAD → STT → Claude LLM → TTS → WebSocket.
  Supports barge-in (interrupt TTS when user speaks), audio chunking, and
  24kHz→16kHz TTS resampling.
- **session_router.py**: Pass WebSocket directly to pipeline instead of
  wrapping in AppTransport.
- **app_transport.py**: Deprecated (no longer needed).
- **kokoro_service.py**: Fix misaki compatibility (MutableToken→MToken
  rename), use correct Chinese voice 'zf_xiaoxiao', handle torch tensors.
- **main.py**: Apply misaki monkey-patch before importing kokoro.
- **settings.py**: Change default TTS voice from 'zh_female_1' (non-existent)
  to 'zf_xiaoxiao' (valid Kokoro-82M Chinese female voice).
- **requirements.txt**: Remove pipecat-ai dependency, pin kokoro==0.3.5 +
  misaki==0.7.17, add Chinese NLP deps (pypinyin, cn2an, jieba, ordered-set).
- **agent_call_page.dart**: Wrap each cleanup step in try/catch to ensure
  Navigator.pop() always executes after call ends. Add 3s timeout on session
  delete request.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 23:34:35 -08:00
hailin 2a87dd346e fix: send empty JSON body to voice session endpoint (fixes 422)
FastAPI 的 create_session 端点声明了 Pydantic request body
(CreateSessionRequest),虽然所有字段都有默认值,但 FastAPI
仍要求请求包含有效 JSON body(至少 {})。Flutter 端 dio.post
未传 data 参数导致 Content-Type 缺失,FastAPI 返回 422
Unprocessable Entity。修复:添加 data: {} 发送空 JSON 对象。

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 21:21:48 -08:00
hailin 45c54acb87 fix: improve voice call page UI centering and error display
1. 布局居中:将 Column 包裹在 Center 中,所有文本添加
   textAlign: TextAlign.center,确保头像、标题、副标题
   在各种屏幕尺寸上居中显示。

2. 错误展示优化:将 SnackBar 大面积红色块替换为行内错误卡片,
   采用圆角容器 + error icon + 简洁文案,视觉上更融洽。
   新增 _errorMessage 状态字段 + _friendlyError() 方法,
   将 DioException 等异常转换为中文友好提示(如 "语音服务
   暂不可用 (503)"),避免用户看到大段英文 stacktrace。

3. 错误状态清理:点击接听时自动清除上一次的 _errorMessage。

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 21:10:49 -08:00
hailin a6cd3c20d9 feat: add WebSocket robustness to voice call (heartbeat, reconnect, jitter buffer)
Addresses reliability gaps in the real-time voice WebSocket connection
between Flutter client and Python voice-service backend.

Backend (voice-service):
- Heartbeat: new _heartbeat_sender coroutine sends JSON ping text frames
  every 15s alongside the Pipecat pipeline; failed send = dead connection
- Session preservation: on WebSocket disconnect, sessions are now marked
  "disconnected" with a timestamp instead of being deleted, allowing
  reconnection within a configurable TTL (default 60s)
- Reconnect endpoint: POST /sessions/{id}/reconnect verifies the session
  is alive and in "disconnected" state, returns fresh websocket_url
- Reconnect-aware WS handler: detects "disconnected" sessions, cancels
  stale pipeline tasks, creates a new pipeline, sends "session.resumed"
- Background cleanup: asyncio loop every 30s removes sessions that have
  been disconnected longer than session_ttl
- Structured event protocol: text frames = JSON control messages
  (ping/pong/session.resumed/session.ended/error), binary = PCM audio
- New settings: session_ttl (60s), heartbeat_interval (15s),
  heartbeat_timeout (45s)

Flutter (agent_call_page.dart):
- Heartbeat monitoring: tracks last server ping timestamp, triggers
  reconnect if no ping received in 45s (3 missed intervals)
- Auto-reconnect: exponential backoff (1s→2s→4s→8s→16s), max 5 attempts;
  calls /reconnect endpoint to verify session, rebuilds WebSocket,
  resets audio buffer, restarts heartbeat
- Reconnecting UI: yellow warning banner "重新连接中... (N/5)" with
  spinner overlay during reconnection attempts
- WebSocket data routing: _onWsData distinguishes String (JSON control)
  from binary (audio) frames, handles ping/session.resumed/session.ended
- User-initiated disconnect guard: _userEndedCall flag prevents reconnect
  attempts when user intentionally hangs up
- session_id field compatibility: supports session_id/sessionId/id

Flutter (pcm_player.dart):
- Jitter buffer: queues incoming PCM chunks, starts playback only after
  accumulating 4800 bytes (150ms at 16kHz 16-bit mono) to smooth out
  network timing variance
- reset() method: clears buffer on reconnect to discard stale audio
- Buffer underrun handling: re-enters buffering phase if queue empties

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 07:32:19 -08:00
hailin 15e6fca6c0 fix: translate all remaining English UI strings to Chinese and remove dead code
- Translate approval_action_card (Approve/Reject/Cancel/Expired)
- Translate tool_execution_card status labels (Executing/Completed/Error)
- Translate chat_providers error messages and stream content
- Translate message_bubble "Thinking..." indicator
- Translate terminal page tooltips (Reconnect/Disconnect)
- Translate fallback values (Untitled/Unknown/No message) across all pages
- Translate auth error "Login failed" and stream error messages
- Remove dead voice_providers.dart (used speech_to_text which is not installed)
- Remove dead voice_input_button.dart (not referenced anywhere)
- Fix widget_test.dart (was referencing non-existent MyApp class)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 02:07:57 -08:00
hailin 092a561867 feat: 完成 iAgent App 三大功能 + 修复租户上下文
## 功能一:设置页(完整实现)
- 新增浅色主题(lightTheme),支持深色/浅色/跟随系统三种模式
- app.dart 接入 themeMode 动态切换
- 设置页完整重写:个人信息编辑、修改密码、主题切换、通知开关
- 新增 settings_remote_datasource 对接后端 admin/settings API
- settings_providers 新增 AccountProfileNotifier 管理远程个人资料

## 功能二:语音通话(音频集成)
- 添加 flutter_sound 依赖,创建 PcmPlayer 流式 PCM 播放器
- agent_call_page 替换空壳:真实麦克风采集(record + GTCRN 降噪)
- 真实 PCM 16kHz 流式播放,基于 RMS 能量驱动波形动画
- 修复 WebSocket URL 路径:/ws/voice/ → /api/v1/voice/ws/
- voice_repository_impl 支持后端返回相对路径自动拼接

## 功能三:推送通知(WebSocket MVP)
- 添加 flutter_local_notifications + socket_io_client 依赖
- 创建 AppNotification 实体、NotificationService(Socket.IO 连接 comm-service)
- 通知 providers:列表管理 + 未读计数
- 登录后自动连接通知服务,登出断开
- 底部导航 Alerts 标签添加未读角标(Badge)
- AndroidManifest 添加 POST_NOTIFICATIONS 权限
- main.dart 初始化本地通知插件

## 修复:租户上下文未初始化(500错误)
- 根因:登录后未设置 currentTenantIdProvider,导致 X-Tenant-Id 头缺失
- Flutter 端:login() 成功后从 JWT 设置 tenantId,logout 时清除
- 后端:tenant-context.middleware 增加 JWT tenantId 回退逻辑
- AuthUser 模型新增 tenantId 字段解析

新增 5 个文件,修改 16 个文件,添加 3 个依赖包

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 01:10:52 -08:00
hailin 4e1b75483d fix: 修复 .gitignore 误忽略 Flutter data/models/ 源码导致构建失败
问题描述:
  在其他机器上构建报错:
  "Error when reading 'lib/features/auth/data/models/auth_response.dart': 系统找不到指定的路径"
  导致 AuthUser、AuthResponse 等类型找不到,编译失败。

根本原因:
  根目录 .gitignore 第75行 "models/" 规则本意是忽略 ML 模型大文件,
  但该规则匹配了所有目录名为 models/ 的路径,包括 Flutter 项目中
  DDD 架构的 data/models/ 源码目录(共 11 个 models/ 目录、10 个 .dart 文件)。
  这些文件在本地存在但从未被 Git 追踪,其他机器 pull 后缺失这些文件。

修复内容:
  1. 修改 .gitignore: 将宽泛的 "models/" 替换为精确的规则
     - packages/services/voice-service/models/ — voice-service 下载的 ML 模型
     - *.pt, *.pth, *.safetensors — PyTorch/HuggingFace 模型二进制文件
     - 不再影响 Flutter 的 data/models/ 源码目录

  2. 提交之前被忽略的 10 个 Flutter model 文件:
     - auth/data/models/auth_response.dart — 登录响应 (accessToken, refreshToken, user)
     - chat/data/models/chat_message_model.dart — 聊天消息模型
     - chat/data/models/session_model.dart — 会话模型
     - chat/data/models/stream_event_model.dart — SSE 流事件模型
     - servers/data/models/server_model.dart — 服务器状态模型
     - approvals/data/models/approval_model.dart — 审批请求模型
     - alerts/data/models/alert_event_model.dart — 告警事件模型
     - agent_call/data/models/voice_session_model.dart — 语音会话模型
     - standing_orders/data/models/standing_order_model.dart — 常设指令模型
     - tasks/data/models/task_model.dart — 任务模型

  3. 同时提交:
     - it0_app/test/widget_test.dart — Flutter 默认测试
     - packages/services/voice-service/src/models/__init__.py — Python 模块初始化

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 16:29:03 -08:00
hailin 39c0d83424 feat: rename app from IT0 to iAgent (我智能体)
Web Admin:
- Update browser title to "iAgent Admin Console"
- Translation: en appTitle "iAgent", zh appTitle "我智能体"
- Sidebar: en "iAgent Admin", zh "我智能体"
- Settings placeholder: "iAgent Platform" / "我智能体平台"
- Update alt tags on logo images

Flutter:
- MaterialApp title: "iAgent"
- Chat: "Ask iAgent..." / "Start a conversation with iAgent"
- Terminal: "iAgent Remote Terminal"
- Agent call: "iAgent Calling" / "iAgent"

Logo SVG: text changed from "AI AGENT" to "iAgent"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 06:39:40 -08:00
hailin 00f8801d51 Initial commit: IT0 AI-powered server cluster operations platform
Full-stack monorepo with DDD + Clean Architecture:
- Backend: 7 NestJS microservices + 5 shared libraries (TypeScript)
- Mobile: Flutter app with Riverpod (Dart)
- Web Admin: Next.js dashboard with Zustand + React Query
- Voice: Python voice service (STT/TTS/VAD)
- Infra: Docker Compose, K8s manifests, Turborepo build

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 22:54:37 -08:00