hailin/it0 - it0 - AI Wolves Team

Commit Graph

Author	SHA1	Message	Date
hailin	72584182df	fix(chat): fix VoiceMicButton gesture conflict with IconButton tooltip GestureDetector was fighting with IconButton's inner Tooltip gesture recognizer — onLongPressStart was never called (only vibration from tooltip). Replaced with Listener (raw pointer events) + manual 500ms Timer, which bypasses the gesture arena entirely. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 07:47:48 -08:00
hailin	2182149c4c	feat(chat): voice-to-text fills input box instead of auto-sending - Add POST /api/v1/agent/transcribe endpoint (STT only, no agent trigger) - Add transcribeAudio() to chat datasource and provider - VoiceMicButton now fills the text input field with transcript; user reviews and sends manually - Add OPENAI_API_KEY/OPENAI_BASE_URL to agent-service in docker-compose Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 07:01:39 -08:00
hailin	5721d75461	feat(it0_app): add PTT mode to agent call page - Default to PTT (push-to-talk) on call connect: mic muted until user holds button - Toggle switch between PTT and free voice mode in active call controls - PTT button: press-and-hold unmutes mic, release mutes again - Voice message bubble (waveform + duration) appears after each PTT send - Mute button hidden in PTT mode (mic controlled by PTT button) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 06:49:53 -08:00
hailin	e6f864d409	fix(version-service+gateway+app): fix APK download 404 and SHA-256 false failure Three coordinated fixes to make in-app APK download work end-to-end: 1. version-service/main.ts: serve uploaded files as static assets via NestExpressApplication.useStaticAssets('/data/versions', prefix: '/downloads/versions'), so GET /downloads/versions/{platform}/{file} returns the actual APK stored in the Docker volume. 2. kong.yml: add /downloads/versions route to Kong so requests from the Flutter app can reach version-service through the API gateway. Previously only /api/v1/versions and /api/app/version were routed; the download URL returned by the check endpoint was unreachable (404). 3. download_manager.dart: skip SHA-256 verification when sha256Expected is empty string. The check endpoint always returns sha256:"" because version-service doesn't store file hashes. The previous code compared actual_hash == "" which always failed, causing the downloaded file to be deleted after a successful download. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 06:04:27 -08:00
hailin	0f328b9794	feat(it0_app): add detailed logging to VersionChecker for update diagnosis Add verbose debugPrint logs throughout VersionChecker to diagnose why app update check isn't triggering: - Log apiBaseUrl and full request URL + query params before the request - Log response status code and raw response body - Log explicit needUpdate=true/false with version details - Log version code comparison (server versionCode vs local buildNumber) - Add stack trace to all catch blocks for better error diagnosis Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 05:51:45 -08:00
hailin	55b983a950	feat(it0_app): add WhatsApp-style voice message with async agent interrupt New VoiceMicButton widget (press-and-hold to record, release to send): - Records audio to a temp .m4a file via the `record` package - Slide-up gesture cancels recording without sending - Pulsing red mic icon + "松开发送/松开取消" feedback during recording New flow for voice messages: 1. Temp "🎤 识别中..." bubble shown immediately 2. Audio uploaded to POST /api/v1/agent/sessions/:id/voice-message (multipart/form-data; backend runs Whisper STT) 3. Placeholder replaced with real transcript 4. WS stream subscribed via new subscribeExistingTask() to receive agent's streaming response — same pipeline as text chat Voice messages act as async interrupts: if the agent is mid-task the backend hard-cancels it before processing the new voice command, so whoever presses the mic button always takes priority. Files changed: chat_remote_datasource.dart — sendVoiceMessage() multipart upload chat_repository.dart — subscribeExistingTask() interface method chat_repository_impl.dart — implement subscribeExistingTask(); fix sendVoiceMessage() stub chat_providers.dart — ChatNotifier.sendVoiceMessage() voice_mic_button.dart — NEW press-and-hold recording widget chat_page.dart — mic button added to input area Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 03:20:41 -08:00
hailin	9546dab93d	fix(it0_app): stop using systemPrompt as conversation title Voice sessions set systemPrompt to the voice-mode instruction string, causing every voice conversation to display '你正在通过语音与用户实时对话。请…' as its title in the chat history list. Title derivation priority (highest to lowest): 1. metadata.title — explicit title saved by backend on first task 2. metadata.voiceMode == true → '语音对话 M/D HH:mm' 3. Fallback → '对话 M/D HH:mm' based on session createdAt	2026-03-04 02:32:08 -08:00
hailin	9ed80cd0bc	feat: implement complete commercial monetization loop (Phases 1-4) ## Phase 1 - Token Metering + Quota Enforcement ### Usage Tracking - agent-service: add UsageRecord entity (per-tenant schema) tracking inputTokens/outputTokens/costUsd per AI task - Modify all 3 AI engines (claude-api, claude-code-cli, claude-agent-sdk) to emit separate input/output token counts in the `completed` event - claude-api-engine: costUsd = (input3 + output15) / 1,000,000 (claude-sonnet-4-5 pricing: $3/MTok in, $15/MTok out) - agent.controller: persist UsageRecord and publish `usage.recorded` event to Redis Streams on every task completion (non-blocking) - shared/events: new events UsageRecordedEvent, SubscriptionChangedEvent, QuotaExceededEvent, PaymentReceivedEvent ### Quota Enforcement - TenantInfo: add maxServers, maxUsers, maxStandingOrders, maxAgentTokensPerMonth fields - TenantContextMiddleware: rewritten to query public.tenants table for real quota values; 5-min in-memory cache; plan-based fallback on error - TenantContextService: getTenant() returns null instead of throwing; added getTenantOrThrow() for strict callers - inventory-service/server.controller: 429 when maxServers exceeded - ops-service/standing-order.controller: 429 when maxStandingOrders exceeded - auth-service/auth.service: 429 when maxUsers exceeded - 002-create-tenant-schema-template.sql: add usage_records table ## Phase 2 - billing-service (New Microservice, port 3010) ### Domain Layer (public schema, all UUIDs) Entities: Plan, Subscription, Invoice, InvoiceItem, Payment, PaymentMethod, UsageAggregate Domain services: - SubscriptionLifecycleService: full state machine (trialing -> active -> past_due -> cancelled/expired); upgrades immediate, downgrades at period end - InvoiceGeneratorService: monthly invoice = base fee + overage charges; proration item for mid-cycle upgrades - OverageCalculatorService: (totalTokens - includedTokens) * overageRate ### Infrastructure (all repos use DataSource directly, NOT TenantAwareRepository) - PlanRepository, SubscriptionRepository, InvoiceRepository (atomic transaction for invoice+items), PaymentRepository (payments + methods), UsageAggregateRepository (UPSERT via ON CONFLICT for atomic accumulation) ### Application Use Cases - CreateSubscriptionUseCase: called on tenant registration - ChangePlanUseCase: upgrade (immediate + proration) or downgrade (scheduled) - CancelSubscriptionUseCase: immediate or at-period-end - GenerateMonthlyInvoiceUseCase: cron target (1st of month 00:05 UTC); generates invoices, renews periods, applies scheduled downgrades - AggregateUsageUseCase: Redis Streams consumer group billing-service, upserts monthly usage aggregates from usage.recorded events - CheckTokenQuotaUseCase: hard limit enforcement per plan - CreatePaymentSessionUseCase + HandlePaymentWebhookUseCase ### REST API - GET /api/v1/billing/plans - GET/POST /api/v1/billing/subscription (+ /upgrade, /cancel) - GET /api/v1/billing/invoices (paginated) - GET /api/v1/billing/invoices/:id - POST /api/v1/billing/invoices/:id/pay - GET /api/v1/billing/usage/current + /history - CRUD /api/v1/billing/payment-methods - POST /api/v1/billing/webhooks/{stripe,alipay,wechat,crypto} ### Plan Seed (auto on startup via PlanSeedService) - free: $0/mo, 100K tokens, no overage, hard limit 100% - pro: $49.99/mo, 1M tokens, $8/MTok, hard limit 150% - enterprise: $199.99/mo, 10M tokens, $5/MTok, no hard limit ## Phase 3 - Payment Provider Integration ### PaymentProviderRegistry (Strategy Pattern, mirrors EngineRegistry) All providers use @Optional() injection; unconfigured providers omitted - StripeProvider: PaymentIntent API; webhook via stripe.webhooks.constructEvent - AlipayProvider: alipay-sdk; Native QR (precreate); RSA2 signature verify - WeChatPayProvider: v3 REST; Native Pay code_url; AES-256-GCM decrypt; HMAC-SHA256 request signing and webhook verification - CryptoProvider: Coinbase Commerce; hosted checkout; HMAC-SHA256 verify ### WebhookController All 4 webhook endpoints are public (no JWT) for payment provider callbacks. rawBody: true enabled in main.ts for signature verification. ## Infrastructure Changes - docker-compose.yml: billing-service container (port 13010); added as dependency of api-gateway - kong.yml: /api/v1/billing routes (JWT); /api/v1/billing/webhooks (public) - 005-create-billing-tables.sql: 7 billing tables + invoice sequence + ALTER tenants to add quota columns - run-migrations.ts: 005 runs as part of shared schema step ## Phase 4 - Frontend ### Web Admin (Next.js) New pages: - /billing: subscription card + token usage bar + warning banner + invoices - /billing/plans: comparison grid with USD/CNY toggle + upgrade/downgrade flow - /billing/invoices: paginated table with Pay Now button Sidebar: Billing group (CreditCard icon, 3 sub-items) i18n: billing keys added to en + zh sidebar translations ### Flutter App New feature module it0_app/lib/features/billing/: - BillingOverviewPage: plan card + token LinearProgressIndicator + latest invoice + upgrade button - BillingProvider (FutureProvider): parallel fetch subscription/quota/invoice Settings page: "订阅与用量" entry card Router: /settings/billing sub-route Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-03 21:09:17 -08:00
hailin	26369be760	docs: add detailed comments for thinking state indicator mechanism voice-agent agent.py: - Module docstring explains lk.agent.state lifecycle (initializing → listening → thinking → speaking) - Explains how RoomIO publishes state as participant attribute - Documents BackgroundAudioPlayer with all available built-in clips Flutter agent_call_page.dart: - Documents _agentState field and all possible values - Documents ParticipantAttributesChanged listener with UI mapping Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 06:13:54 -08:00
hailin	33bd1aa3aa	feat: add "thinking" state indicator for voice calls - voice-agent: enable BackgroundAudioPlayer with keyboard typing sound during LLM thinking state (auto-plays when agent enters "thinking", stops when "speaking" starts) - Flutter: monitor lk.agent.state participant attribute from LiveKit agent, show pulsing dots animation + "思考中..." text when thinking, avatar border changes to warning color with pulsing glow ring - Both call mode and chat mode headers show thinking state Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 05:45:04 -08:00
hailin	63b986fced	fix: redesign voice call mixed-mode input with dual-layout architecture Problem: - Text input area caused BOTTOM OVERFLOWED BY 135 PIXELS when keyboard opened - Input bar overlapped with call control buttons - Sent messages were not displayed on screen (only SnackBar feedback) Solution — split into two distinct layouts: 1. Call Mode (default): - Full-screen call UI: avatar, waveform, duration, large control buttons - Keyboard button in controls toggles to chat mode - No text input elements — clean voice-only interface 2. Chat Mode (tap keyboard button): - Compact call header: green status dot + "iAgent" + duration + inline mute/end/speaker/collapse controls - Scrollable message list (Expanded widget — properly handles keyboard) - User messages: right-aligned blue bubbles with attachment thumbnails - Agent responses: left-aligned gray bubbles with robot avatar - Input bar at bottom: attachment picker + text field + send button Message display: - User-sent text/attachments tracked in _messages list, shown as bubbles - Agent responses sent back via LiveKit data channel (topic='text_reply') from voice-agent → Flutter, displayed as assistant bubbles - Auto-scroll to latest message Voice-agent change (agent.py): - After session.say(response), publish response text back to Flutter via ctx.room.local_participant.publish_data() with topic='text_reply' - Flutter listens for DataReceivedEvent to display agent responses Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 06:11:07 -08:00
hailin	ce63ece340	feat: add mixed-mode input (text + images + files) during voice calls Enable users to send text messages, images, and files to the Agent while an active voice call is in progress. This addresses the case where spoken instructions are unclear or screenshots/documents need to be shared for analysis. ## Architecture Data flows through LiveKit data channel (not direct HTTP): Flutter → publishData(topic='text_inject') → voice-agent → llm.inject_text_message() → POST /api/v1/agent/tasks (same session) → collect streamed response → session.say() → TTS playback This preserves the constraint that voice-agent owns the agent-service sessionId — Flutter never contacts agent-service directly. ## Flutter UI (agent_call_page.dart) - Add keyboard toggle button to active call controls (4-button row) - Collapsible text input area with attachment picker (+) and send button - Attachment support: gallery multi-select, camera, file picker (images max 1024x1024 quality 80%, PDF supported, max 5 attachments) - Horizontal scrolling attachment preview with delete buttons - 200KB payload size check before LiveKit data channel send - Layout adapts: Spacer flex 1/3 toggle, reduced bottom padding ## voice-agent (agent.py) - Register data_received event listener after session.start() - Filter for topic='text_inject', parse JSON payload - Call llm.inject_text_message(text, attachments) and TTS via session.say() - Use asyncio.ensure_future() wrapper for async handler (matches existing disconnect handler pattern for sync EventEmitter) ## AgentServiceLLM (agent_llm.py) - New inject_text_message(text, attachments) method on AgentServiceLLM - Reuses same _agent_session_id for conversation context continuity - WS+HTTP streaming: connect, pre-subscribe, POST /tasks with attachments field, collect full text response, return string - _injecting flag prevents concurrent _do_stream from clearing session ID on abort errors while inject is in progress - Same systemPrompt/voiceMode/engineType as voice pipeline No agent-service changes required — attachments already supported end-to-end (JSONB storage → multimodal content blocks → Claude). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 05:38:04 -08:00
hailin	59a3e60b82	feat: add engine type selection (Agent SDK / Claude API) for voice calls Full-stack implementation allowing users to choose between Claude Agent SDK (default, with tool approval, skill injection, session resume) and Claude API (direct, lower latency) in Flutter settings. Agent SDK mode wraps prompts with voice-conversation instructions for concise spoken Chinese output. Data flow: Flutter Settings → SharedPreferences → POST /livekit/token → RoomAgentDispatch metadata → voice-agent → AgentServiceLLM(engine_type) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 02:11:51 -08:00
hailin	c9e196639a	fix: add missing subscribe parameter to Timeouts constructor All 6 Timeouts parameters are required in livekit_client 2.6.4. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 23:52:49 -08:00
hailin	3fd27ff190	fix: add required debounce parameter to Timeouts constructor Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 23:42:54 -08:00
hailin	e66c187353	fix: improve voice pipeline robustness for poor network conditions Flutter (agent_call_page.dart): - Add ConnectOptions with 15s timeouts for connection/peerConnection/iceRestart - Add RoomReconnectingEvent/RoomAttemptReconnectEvent/RoomReconnectedEvent listeners with "网络重连中" UI indicator during reconnection - Add TimeoutException detection in _friendlyError() voice-agent (agent.py): - Wrap entrypoint() in try-except with full traceback logging - Register room disconnect listener to close httpx clients (instead of finally block, since session.start() returns while session runs in bg) - Add asyncio import for ensure_future cleanup voice-agent LLM proxy (agent_llm.py): - Add retry with exponential backoff (max 2 retries, 1s/3s delays) for network errors (ConnectError/ConnectTimeout/OSError) and WS InvalidStatusCode - Extract _do_stream() method for single-attempt logic - Add WebSocket connection params: open_timeout=10, ping_interval=20, ping_timeout=10 for keepalive and faster dead-connection detection - Use granular httpx.Timeout(connect=10, read=30, write=10, pool=10) - Increase WS recv timeout from 5s to 30s to reduce unnecessary loops Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 23:34:55 -08:00
hailin	8a48e92970	fix: use domain names for API access, China IP for LiveKit Flutter app now uses https://it0api.szaiai.com (nginx reverse proxy) instead of direct IP:port. LiveKit URL uses China IP 14.215.128.96 for lower latency from domestic mobile clients. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 21:44:25 -08:00
hailin	68ee2516d5	fix: use host networking for voice services to eliminate docker-proxy overhead Bridge mode created 600+ docker-proxy processes for LiveKit's UDP port-range mappings (30000-30100, 50000-50200). Switch livekit-server, voice-agent, and voice-service to network_mode: host for zero-overhead networking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 19:58:32 -08:00
hailin	5460be8c04	feat: add TTS voice and style settings to Flutter app Add user-configurable TTS voice and tone style settings that flow from the Flutter app through the backend to the voice-agent at call time. ## Flutter App (it0_app) ### Domain Layer - app_settings.dart: Add `ttsVoice` (default: 'coral') and `ttsStyle` (default: '') fields to AppSettings entity with copyWith support ### Data Layer - settings_datasource.dart: Add SharedPreferences keys `settings_tts_voice` and `settings_tts_style` for local persistence in loadSettings(), saveSettings(), and clearSettings() ### Presentation Layer - settings_providers.dart: Add `setTtsVoice()` and `setTtsStyle()` methods to SettingsNotifier for Riverpod state management - settings_page.dart: Add "语音" settings group between Notifications and Security groups with: - Voice picker: 13 OpenAI voices with gender/style labels (e.g. "女 · 温暖", "男 · 沉稳", "中性") in a BottomSheet - Style picker: 5 presets (专业干练/温柔耐心/轻松活泼/严肃正式/科幻AI) as ChoiceChips + custom text input field + reset button ### Call Flow - agent_call_page.dart: Send `tts_voice` and `tts_style` in the POST body when requesting a LiveKit token at call initiation ## Backend ### voice-service (Python/FastAPI) - livekit_token.py: Accept optional `tts_voice` and `tts_style` via Pydantic TokenRequest body model; embed them in RoomAgentDispatch metadata JSON alongside auth_header (backward compatible) ### voice-agent (Python/LiveKit Agents) - agent.py: Extract `tts_voice` and `tts_style` from ctx.job.metadata; use them when creating openai_plugin.TTS() — user-selected voice overrides config default, user-selected style overrides default instructions. Falls back to config defaults when not provided. ## Data Flow Flutter Settings → SharedPreferences → POST /livekit/token body → voice-service embeds in RoomAgentDispatch metadata → voice-agent reads from ctx.job.metadata → TTS creation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 09:38:15 -08:00
hailin	46a2d06be3	fix: implement speaker/earpiece toggle on voice call page Use Hardware.instance.setSpeakerphoneOn() to switch between speaker and earpiece modes. Default to speaker on. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 11:11:29 -08:00
hailin	19efeec26d	fix: remove unsupported audioBitrate param from AudioPublishOptions livekit_client 2.6.4 no longer has audioBitrate parameter. Default AudioPublishOptions auto-selects optimal speech bitrate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 10:13:11 -08:00
hailin	94a14b3104	feat: migrate voice call from WebSocket/PCM to LiveKit WebRTC 实时语音对话架构迁移：WebSocket → LiveKit WebRTC ## 背景原语音通话架构基于 FastAPI WebSocket 传输原始 PCM，管道串行执行（VAD → 批量STT → Agent → 攒句 → 批量TTS），首音频延迟约 6 秒。迁移到 LiveKit Agents 框架后，利用 WebRTC 传输 + 流水线并行，预期延迟降至 1.5-2 秒。 ## 架构 Flutter App ←── WebRTC (Opus/UDP) ──→ LiveKit Server ←──→ Voice Agent livekit_client (自部署, Go) (Python, LiveKit Agents SDK) ├─ VAD (Silero) ├─ STT (faster-whisper / OpenAI) ├─ LLM (自定义插件 → agent-service) └─ TTS (Kokoro / OpenAI) 关键设计：LLM 不直接调用 Claude API，而是通过自定义插件代理到现有 agent-service，保留 Tool Use、会话历史、租户隔离等能力。 ## 新增服务 ### voice-agent (packages/services/voice-agent/) LiveKit Agent Worker，包含： - agent.py: 入口，prewarm() 预加载模型，entrypoint() 编排会话 - plugins/agent_llm.py: 自定义 LLM 插件，代理 agent-service API - POST /api/v1/agent/tasks 创建任务 - WS /ws/agent 订阅流式事件 (stream_event) - 跨轮复用 session_id 保持对话上下文 - plugins/whisper_stt.py: 本地 faster-whisper STT (批量识别) - plugins/kokoro_tts.py: 本地 Kokoro-82M TTS (24kHz PCM) - config.py: pydantic-settings 配置 ### LiveKit Server (deploy/docker/) - livekit.yaml: 信令端口 7880, RTC TCP 7881, UDP 50000-50200 - docker-compose.yml: 新增 livekit-server + voice-agent 容器 ### LiveKit Token 端点 - voice-service/src/api/livekit_token.py: POST /api/v1/voice/livekit/token 生成 Room JWT，嵌入 auth_header 到 AgentDispatch metadata ## Flutter 客户端改造 - agent_call_page.dart: 从 ~814 行简化到 ~380 行 - 替换: WebSocketChannel, AudioRecorder, PcmPlayer, 手动心跳/重连 - 使用: Room.connect(), setMicrophoneEnabled(true), LiveKit 事件监听 - 波形动画改用 participant.audioLevel - pubspec.yaml: 添加 livekit_client: ^2.3.0 - app_config.dart: 增加 livekitUrl 字段 - api_endpoints.dart: 增加 livekitToken 端点 ## 配置说明 (环境变量) - STT_PROVIDER: local (默认, faster-whisper) / openai - TTS_PROVIDER: local (默认, Kokoro) / openai - WHISPER_MODEL: base (默认) / small / medium / large - WHISPER_LANGUAGE: zh (默认) - KOKORO_VOICE: zf_xiaoxiao (默认) - DEVICE: cpu (默认) / cuda ## 不变的部分 - agent-service: 完全不改，voice-agent 通过现有 API 调用 - voice-service 核心: pipeline/STT/TTS/VAD 保留 (Twilio 备用) - Kong 网关: 现有路由不变 - 数据库: 无 schema 变更 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 08:55:33 -08:00
hailin	7e44ddc358	fix: file picker now shows subdirectories on Android FileType.custom with allowedExtensions causes Android system picker to hide subdirectories on some devices. Changed to FileType.any with post-selection extension validation instead. - Unsupported file types are skipped with a SnackBar hint - Allowed: jpg, jpeg, png, gif, webp, pdf Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 06:02:47 -08:00
hailin	3025910095	ui: transparent compact AppBar (64dp → 44dp) - AppBar background transparent, merges with scaffold for seamless look - toolbarHeight reduced from 64dp to 44dp (~20dp screen space saved) - scrolledUnderElevation: 0 prevents Material 3 shadow on scroll - Icons 24→20px with VisualDensity.compact for tighter action buttons - Title fontSize 16 w600, less visual weight - Both dark and light themes updated consistently Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 05:20:23 -08:00
hailin	ed39518a71	feat: floating pill input bar + auto-scroll on history load Input area redesign (ChatGPT/Claude App style): - Replace fixed bottom bar with floating pill overlay using Stack+Positioned - Semi-transparent background (surface 92% opacity) with rounded corners (28px) - Drop shadow for depth separation from content - Remove inner TextField border (InputBorder.none) for cleaner look - ListView bottom padding increased to 80px to leave room under the pill - Input pill floats 12px from edges, 8px from bottom History scroll fix: - Add jump parameter to _scrollToBottom() for instant positioning - When loading conversation history (empty→many messages), use jumpTo instead of animateTo to avoid incomplete scroll on large message lists - Double-frame jumpTo ensures layout settles before final scroll position Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 05:15:18 -08:00
hailin	1f1bf18a75	fix: remove clipboard paste menu item, fix timeline line overlap, dim input placeholder - Remove redundant "从剪贴板粘贴" option from attachment menu (long-press to paste natively) - Remove super_clipboard dependency (no longer needed) - Fix timeline vertical line overlapping icon nodes by using dynamic dotRadius - Dim input field placeholder color to AppColors.textMuted Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 05:05:27 -08:00
hailin	cfc0a97da7	fix: correct super_clipboard getFile API call signature getFile requires two positional args: format and callback. Wrapped in Completer for async/await usage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 04:45:19 -08:00
hailin	5f28605e13	feat: add clipboard paste, multi-image select, and file picker - Add super_clipboard and file_picker dependencies - Clipboard paste: reads PNG/JPEG image data from system clipboard - Multi-image: pickMultiImage with remaining count limit - File picker: supports images (jpg/png/gif/webp) and PDF files - Updated attachment preview to show file icon for non-image types - Bottom sheet now shows 4 options: gallery, camera, clipboard, file Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 04:32:16 -08:00
hailin	e4c2505048	feat: add multimodal image input with streaming markdown optimization Two major features in this commit: 1. Streaming Markdown Rendering Optimization - Replace deprecated flutter_markdown with gpt_markdown (active, AI-optimized) - Real-time markdown rendering during streaming (was showing raw syntax) - Solid block cursor (█) instead of AnimationController blink - 80ms token throttle buffer reducing rebuilds from per-token to ~12.5/sec - RepaintBoundary isolation for markdown widget repaints - StreamTextWidget simplified from StatefulWidget to StatelessWidget 2. Multimodal Image Input (camera + gallery + display) - Flutter: image_picker for gallery/camera, base64 encoding, attachment preview strip with delete, thumbnails in sent messages - Data layer: List<String>? → List<Map<String, dynamic>>? for structured attachment payloads through datasource/repository/usecase - ChatAttachment model with base64Data, mediaType, fileName - ChatMessage entity + ChatMessageModel both support attachments field - Backend DTO, Entity (JSONB), Controller, ConversationContextService all extended to receive, store, and reconstruct Anthropic image content blocks in loadContext() - Claude API engine skips duplicate user message when history already ends with multimodal content blocks - NestJS body parser limit raised to 10MB for base64 image payloads - Android CAMERA permission added to manifest - Image.memory uses cacheWidth/cacheHeight for memory efficiency - Max 5 images per message enforced in UI Data flow: ImagePicker → base64Encode → ChatAttachment → POST body → DB (JSONB) → loadContext → Anthropic image content blocks → Claude API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 03:24:17 -08:00
hailin	89f0f6134d	fix: resolve bottom overflow issues in chat page timeline rendering Three root causes fixed: 1. TimelineEventNode: Replaced IntrinsicHeight (which forces intrinsic height calculation on unbounded content) with CustomPaint-based _TimelineLinePainter that draws vertical lines based on actual rendered widget size. Also added maxLines/ellipsis to label text and mainAxisSize.min on inner Column. 2. ApprovalActionCard: Changed countdown + action buttons layout from Row with Spacer (which requires infinite width) to Wrap with spacing, preventing horizontal overflow on narrow screens. 3. AnimatedCrossFade in _CollapsibleCodeBlock and _CollapsibleThinking: Wrapped with ClipRect and added sizeCurve: Curves.easeInOut to prevent the outgoing child from extending beyond parent bounds during the cross-fade transition. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 01:38:37 -08:00
hailin	50dbb641a3	fix: comprehensive hardening of agent task cancel/inject/approve flows 6 rounds of systematic audit identified and fixed 14 bugs across backend controller and Flutter client: ## Backend (agent.controller.ts) Security & Tenant Isolation: - Add @TenantId + ForbiddenException check to cancelTask, injectMessage, approveCommand — all 4 write endpoints now enforce tenant isolation - Add tenantId check on session reuse in executeTask to prevent cross-tenant session hijacking Architecture & Correctness: - Extract shared runTaskStream() from inline fire-and-forget block, used by both executeTask and injectMessage to reduce duplication - Use session.engineType (not getActiveEngine()) in cancelTask, injectMessage, approveCommand — fixes wrong-engine-cancel when global engine config is switched after task creation - Add concurrent task prevention: executeTask checks for existing RUNNING task on same session and cancels it before starting new one - Add runningTasks Map to track task promises, awaitTaskCleanup() helper with 3s timeout for inject to wait for partial text save - captureSdkSessionId() captures SDK session ID into metadata without DB save (callers persist), preventing fire-and-forget race Cancel/Reject Improvements: - cancelTask: idempotent (returns early if already CANCELLED/COMPLETED), session stays 'active' (was 'cancelled'), emits cancelled WS event - approveCommand reject: session stays 'active' (was 'cancelled'), now emits cancelled WS event so Flutter stream listeners clean up - approveCommand approved: collect text events and save assistant response to conversation history on completion (was missing) Minor: - task.result! non-null assertion → task.result ?? 'Unknown error' - Add findRunningBySessionId() to TaskRepository ## Flutter API Contract Fix: - approveCommand: route changed from /api/v1/ops/approvals/:id/approve to /api/v1/agent/tasks/:id/approve with {approved: true} body - rejectCommand: route changed from /api/v1/ops/approvals/:id/reject to /api/v1/agent/tasks/:id/approve with {approved: false} body Resource Management: - ChatNotifier.dispose() now disconnects WebSocket to prevent connection leak when navigating away from chat Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 22:20:46 -08:00
hailin	d5f663f7af	feat: inject-message support for mid-stream task interruption Backend (agent-engine.port.ts): - Add `cancelled` event type: emitted when a task is cancelled (user-initiated or injection), so Flutter can close the old stream cleanly - Add `task_info` event type: emitted after inject to pass the new taskId to the client, enabling cancel/re-inject on the replacement task Flutter (features/chat/): - ChatState: track current `taskId` alongside `sessionId`; clear on completion or error - Handle `TaskInfoEvent`: update taskId in state when server issues a new task - Handle `CancelledEvent`: treat as stream termination (agentStatus → idle) - MessageType.interrupted: new UI node (warning style) for mid-stream cancels - _inject(): send text as an inject request while streaming; backend cancels the current task and starts a new one with the injected message - Input area: during streaming, hint changes to "追加指令...", Enter key calls _inject() instead of _send(), and both inject-send + stop buttons are shown - isAwaitingApproval kept separate from isStreaming so approval flow is not blocked by inject mode Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 21:33:50 -08:00
hailin	f5d9b1f04f	feat: add app upgrade system with self-hosted APK update support - Add core/updater module: version checker, download manager (resumable + SHA-256), APK installer, app market detector, self-hosted updater with progress dialogs - Add Android native MethodChannels for APK installation and market detection - Add FileProvider config and REQUEST_INSTALL_PACKAGES permission - Wire UpdateService singleton into main.dart initialization - Add auto-check on home entry with cooldown + app resume detection - Add manual "检查更新" button and dynamic version display in settings - Fix chat page: bottom overflow, bash spinner persistence, collapsible results - Merge standing orders into tasks page as second tab Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 22:35:01 -08:00
hailin	bc7e32061a	fix: improve voice call reconnection robustness Server side (session_router.py): - /reconnect now accepts sessions in "active" state (not just "disconnected") - When client reconnects to an active session, the old WebSocket/pipeline is automatically replaced when the new WebSocket connects - Only truly terminal states (e.g. "ended") return 409 Flutter side (agent_call_page.dart): - Distinguish terminal errors (404 session gone, 409 ended) from transient errors (network timeout, server unreachable) in reconnect loop - Terminal errors break immediately instead of wasting retry attempts - Extract _connectWebSocket() helper for cleaner reconnect flow - Add DioException handling for proper HTTP status code inspection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 07:33:34 -08:00
hailin	57fabb4653	fix: set interleaved=true for PcmPlayer streaming playback FlutterSoundPlayer.feedUint8FromStream() requires interleaved mode. With interleaved=false, every feed() call threw: "Cannot feed with UInt8 with non interleaved mode" - feedUint8FromStream (Uint8List) → requires interleaved: true - feedFromStream (Float32List) → requires interleaved: false Since we feed raw PCM bytes (Uint8List), interleaved must be true. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 06:59:06 -08:00
hailin	e706a4cdc7	fix: enable simultaneous playback + recording in voice call Root cause: PcmPlayer called openPlayer() without audio session config, so Android defaulted to earpiece-only mode. When the mic was actively recording, playback was silently suppressed — the agent's TTS audio was sent successfully over WebSocket but never reached the speaker. Changes: 1. PcmPlayer (pcm_player.dart): - Added audio_session package for proper audio session management - Configure AudioSession with playAndRecord category so mic + speaker work simultaneously - Set voiceCommunication usage to enable Android hardware AEC (echo cancellation) — prevents feedback loops when speaker is active - defaultToSpeaker routes output to loudspeaker instead of earpiece - Restored setSpeakerOn() method stub (used by UI toggle) 2. AgentCallPage (agent_call_page.dart): - Fixed fire-and-forget bug: _pcmPlayer.feed() returns Future but was called without await, causing interleaved feedUint8FromStream calls - Added _feedChain serializer to guarantee sequential audio feeding 3. Dependencies: - Added audio_session package to pubspec.yaml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 06:48:16 -08:00
hailin	4456550393	feat: lazy-load local TTS/STT models on first request Local /synthesize and /transcribe endpoints now auto-load Kokoro/Whisper models on first call instead of returning 503 when not pre-loaded at startup. This allows switching between Local and OpenAI providers in the Flutter test page without requiring server restart. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 04:38:49 -08:00
hailin	7b71a4f2fc	fix: properly close WebSocket with subscription cancel + fire-and-forget Root cause: IOWebSocketChannel.sink.close() can hang indefinitely (dart-lang/web_socket_channel#185). Previous fix used unawaited close but didn't cancel the stream subscription, so the old listener could still push events to _messageController. Fix: Extract _closeCurrentConnection() that: 1. Cancels StreamSubscription first (stops duplicate events immediately) 2. Fire-and-forget sink.close(goingAway) (frees underlying socket) This follows the workaround recommended in the official issue tracker. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 03:45:43 -08:00
hailin	45eb6bc453	fix: use unawaited close to prevent WebSocket reconnect hang The await on sink.close() blocks indefinitely when the server doesn't respond to the close handshake. Use fire-and-forget with unawaited() so the new connection can proceed immediately. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 03:41:13 -08:00
hailin	3185438f36	fix: close previous WebSocket before opening new connection When sending a second message in the same session, the old WebSocket connection was not closed, causing both connections to subscribe to the same session room. This resulted in each text event being received twice, producing garbled/duplicated output text. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 03:37:16 -08:00
hailin	2403ce5636	feat: multi-turn conversation context management with session history UI Implement DB-based conversation message storage (engine-agnostic) that works across both Claude API and Agent SDK engines. Add ChatGPT/Claude-style conversation history drawer in Flutter with date-grouped session list, session switching, and new chat functionality. Backend: entity, repository, context service, migration 004, session/message API endpoints. Flutter: ConversationDrawer, sessionId flow from backend response via SessionInfoEvent, session list/switch/delete support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 19:04:35 -08:00
hailin	7cda482e49	fix: simplify _dioBinary in voice test page to avoid interceptor conflicts Remove shared interceptors from the binary Dio instance to prevent request dedup/retry interceptors from interfering with audio downloads. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 17:57:58 -08:00
hailin	f7d39d8544	fix: use theme-aware colors in voice test page for dark mode readability Replace hardcoded Colors.grey with Theme.of(context).colorScheme for result containers and status text so they're readable in both light and dark themes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 09:21:06 -08:00
hailin	d43baed3a5	feat: add OpenAI TTS/STT API endpoints for comparison testing - Add openai package to voice-service requirements - Add /api/v1/test/tts/synthesize-openai (tts-1/tts-1-hd/gpt-4o-mini-tts) - Add /api/v1/test/stt/transcribe-openai (gpt-4o-transcribe/whisper-1) - Add OPENAI_API_KEY and OPENAI_BASE_URL env vars to voice-service - Flutter test page: SegmentedButton to toggle Local/OpenAI provider - All endpoints maintain same response format for easy comparison Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 07:20:03 -08:00
hailin	ac0b8ee1c6	fix: rewrite voice test page using flutter_sound for both record and play - Remove record package dependency, use FlutterSoundRecorder instead - Use permission_handler for microphone permission (already in pubspec) - Proper temp file path via path_provider - Cleanup temp files after upload - Single package (flutter_sound) handles both recording and playback Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 05:41:10 -08:00
hailin	d4783a3497	fix: use temp directory path for audio recording instead of empty string The record package requires a valid file path. Empty string caused ENOENT (No such file or directory) on Android. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 05:39:07 -08:00
hailin	5d4fd96d43	feat: streaming claude-api engine, engineType override, fix voice test page - Claude API engine now uses streaming API (messages.stream) for real-time text delta output instead of waiting for full response - Agent controller accepts optional engineType body parameter to allow callers (e.g. voice pipeline) to select a specific engine - Fix voice_test_page.dart compilation error: replace audioplayers (not installed) with flutter_sound (already in pubspec.yaml) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 05:30:11 -08:00
hailin	6e832c7615	feat: add voice I/O test page in Flutter settings - TTS: text input → Kokoro synthesis → audio playback - STT: long-press record → faster-whisper transcription - Round-trip: record → STT → TTS → playback - Added /api/v1/test route to Kong gateway for voice-service - Accessible from Settings → 语音 I/O 测试 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 05:16:10 -08:00
hailin	e20936ee2a	feat: collapsible thinking node in chat timeline Thinking content auto-expands while streaming, auto-collapses when done. User can toggle with "Thinking ∨" button, matching Claude Code VSCode UX. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:34:58 -08:00
hailin	7afbd54fce	fix: rewrite voice pipeline for direct WebSocket I/O, fix TTS and navigation Root cause: Pipecat's WebsocketServerTransport creates its own WebSocket server on (host,port) and expects FrameProcessor subclasses. Our code was passing a FastAPI WebSocket object as 'host' and using plain STT/TTS/VAD service classes that aren't FrameProcessors. The pipeline crashed immediately when receiving audio, causing "disconnects when speaking". Changes: - base_pipeline.py: Complete rewrite — replaced Pipecat Pipeline with direct async loop: WebSocket → VAD → STT → Claude LLM → TTS → WebSocket. Supports barge-in (interrupt TTS when user speaks), audio chunking, and 24kHz→16kHz TTS resampling. - session_router.py: Pass WebSocket directly to pipeline instead of wrapping in AppTransport. - app_transport.py: Deprecated (no longer needed). - kokoro_service.py: Fix misaki compatibility (MutableToken→MToken rename), use correct Chinese voice 'zf_xiaoxiao', handle torch tensors. - main.py: Apply misaki monkey-patch before importing kokoro. - settings.py: Change default TTS voice from 'zh_female_1' (non-existent) to 'zf_xiaoxiao' (valid Kokoro-82M Chinese female voice). - requirements.txt: Remove pipecat-ai dependency, pin kokoro==0.3.5 + misaki==0.7.17, add Chinese NLP deps (pypinyin, cn2an, jieba, ordered-set). - agent_call_page.dart: Wrap each cleanup step in try/catch to ensure Navigator.pop() always executes after call ends. Add 3s timeout on session delete request. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 23:34:35 -08:00

1 2

78 Commits