hailin/it0 - it0 - AI Wolves Team

Commit Graph

Author	SHA1	Message	Date
hailin	440819add8	fix(dingtalk): 55s bridge timeout + batchSend fallback for expired webhooks Root cause of "Bridge call failed" errors: bridge /task endpoint defaults to 25s agent reply timeout, but LLM calls through the iConsulting gateway can take 30-60s. Fix: pass timeoutSeconds=55 explicitly in POST body. Also add batchSend fallback in routeToAgent: if the sessionWebhook has expired by the time the LLM replies (user sent a message, LLM took >30s, webhook window closed), the reply is now sent via proactive batchSend using senderStaffId instead of being silently dropped. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 23:33:56 -07:00
hailin	5874907300	fix(voice): suppress session terminate during DingTalk OAuth flow When the voice agent triggers DingTalk OAuth, the user leaves the app to authorize in DingTalk/browser, causing the LiveKit participant to disconnect. The voice-agent then calls DELETE /voice to terminate the session — but the user intends to return after completing OAuth. Fix: mark the session as "oauth_pending" in VoiceSessionController when oauth-trigger fires. If terminateVoiceSession is called while the flag is active (10-min grace), suppress the terminate and return 200 OK so the voice-agent exits cleanly. The session stays alive; when the user returns to the voice screen, voice/start + inject auto-resume it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 23:14:53 -07:00
hailin	5a66f85235	fix(dingtalk): senderStaffId-first routing + bridge response size cap Two binding paths store different DingTalk ID types: - OAuth binding stores staffId (resolved via unionId→userId at auth time) - Code binding stores senderId ($:LWCP_v1:$... format from bot message) DingTalk Stream API senderId != OAuth openId (different encodings), so primary lookup by senderId always missed OAuth-bound instances, requiring a fallback every time. Reverse the lookup order: try senderStaffId first (direct hit for OAuth binding), fall back to senderId (code binding). Also add MAX_RESPONSE_BYTES cap to httpPostJson — previously uncapped unlike the DingTalk API helpers which already had the 256KB guard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 22:48:03 -07:00
hailin	e905559c46	fix(deploy): use ANTHROPIC_API_KEY env var (openclaw reads this, not CLAUDE_API_KEY) OpenClaw daemon checks ANTHROPIC_API_KEY env var on startup. We were passing CLAUDE_API_KEY which openclaw ignores, so it fell back to auth-profiles.json containing the raw Anthropic key, causing 401 from iConsulting LLM gateway. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 22:06:25 -07:00
hailin	de2d30fa6c	fix(deploy): use gateway key in auth-profiles.json instead of raw Anthropic key OpenClaw reads API key from auth-profiles.json. Was writing raw Anthropic key sk-ant-api03-... which gateway doesn't recognize. Must use effectiveApiKey (sk-gw-oc-... gateway key) so authentication with iConsulting LLM gateway succeeds. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 21:59:39 -07:00
hailin	54ed8290b1	fix(deploy): patch models.generated.js for LLM gateway + fix AGENTS.md template symlink After container starts, sed-replace api.anthropic.com with iConsulting LLM gateway URL in all models.generated.js files (ANTHROPIC_BASE_URL env alone is not enough since baseUrl is hardcoded). Also create missing AGENTS.md template symlink so OpenClaw does not 500 on workspace init. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 15:25:52 -07:00
hailin	50401660ef	fix(dingtalk): exclude removed instances from routing + clear binding on remove Two bugs fixed: 1. findByDingTalkUserId now filters status != 'removed' so a re-bound new instance is not shadowed by an old removed one with the same DingTalk user ID. 2. When an agent is deleted (removed), its dingtalkUserId is cleared so the DingTalk ID is freed for reuse by the next binding. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 14:49:00 -07:00
hailin	a8b5571aea	fix(deploy): pre-create workspace dir with correct ownership for OpenClaw containers OpenClaw runs as node user (uid 1000) but the host directory was created as root, causing EACCES when the container tried to create /home/node/.openclaw/workspace. Now mkdir workspace/ and chown -R 1000:1000 before starting the container. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 14:32:50 -07:00
hailin	495407d25b	fix(agent): pass sessionId to system prompt for text chat OAuth trigger Text sessions were not passing sessionId to SystemPromptBuilder, causing Claude to use the `initiate_dingtalk_binding` custom tool (claude_api only). When the engine is claude_agent_sdk, this tool does not exist → 404. Fix: pass session.id as sessionId to systemPromptBuilder.build() in agent.controller.ts. Claude will now use the wget oauth-trigger endpoint for ALL session types (text and voice), which works with every engine. Also: store userId (staffId) as the DingTalk binding ID when resolvable, falling back to openId. Bot messages deliver senderStaffId which matches userId, not openId — this prevents the "binding not found" routing failure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 14:20:58 -07:00
hailin	3ca3982c28	fix(dingtalk): correct greeting flow — use userId (staffId) for batchSend Problem: sendGreeting() was passing openId as `userIds` to batchSend, but the API requires the enterprise staffId (userId). This caused HTTP 400 "staffId.notExisted" for every OAuth-bound greeting. Fix: 1. completeOAuthBinding now resolves unionId → userId via oapi.dingtalk.com/topapi/user/getbyunionid with corp app token. Non-fatal: if the user has no enterprise context, greeting is skipped with a clear log explaining why (no Contact.User.Read permission or user is not an enterprise member). 2. sendGreeting accepts userId (staffId) and openId separately; uses the correct staffId for batchSend. If userId is undefined, emits a WARN and skips (user gets greeting on first message instead). 3. routeToAgent now tries senderStaffId as fallback if senderId lookup misses — handles edge cases where DingTalk delivers staffId in senderId. 4. Added detailed logging: all three IDs (openId, unionId, userId) are logged at binding time so future issues are immediately diagnosable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 13:53:08 -07:00
hailin	13f2d68754	feat(ux): agent list refresh + OAuth keep-alive + deploy token fix Flutter: - my_agents_page: refresh agent list on every My Agents tab tap (ref.invalidate in ScaffoldWithNav.onDestinationSelected) - chat_page + my_agents_page: activate AudioSession before launching OAuth browser so iOS keeps network connections alive in background; deactivate when app resumes or binding polling completes agent-service deploy: - Write openclaw.json with correct gateway token and auth-profiles.json with API key BEFORE starting the container, so OpenClaw and bridge always agree on the auth token (fixes token_mismatch on new deployments) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 13:26:05 -07:00
hailin	5cf72c4780	feat(dingtalk+bridge): event-based agent reply + greeting on binding openclaw-bridge: - index.ts: /task endpoint now calls chatSendAndWait() with idempotencyKey (removes broken timeoutSeconds param; uses caller-supplied msgId for dedup) - openclaw-client.ts: added onEvent() subscription + chatSendAndWait() that subscribes to 'chat' WS events, waits for state='final' matching runId, and extracts text from the message payload dingtalk-router: - After OAuth binding completes, sends a proactive greeting to the user via DingTalk batchSend API (/v1.0/robot/oToMessages/batchSend) introducing the agent by name and explaining what it can do Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 13:18:52 -07:00
hailin	54e6f13405	fix(dingtalk): bridge 请求参数错误导致消息无法转发给小龙虾问题：routeToAgent 调用 OpenClaw bridge /task 时传了 timeoutSeconds （bridge schema 不认识），且缺少必须字段 idempotencyKey，导致 bridge 返回 INVALID_REQUEST，机器人沉默不回复。修复： - 移除 timeoutSeconds（不是 bridge API 参数） - 改用 msg.msgId 作为 idempotencyKey（每条消息唯一，满足 bridge 要求）根因定位： docker logs openclaw-83cc9ac3 显示 "invalid chat.send params: must have required property 'idempotencyKey'; at root: unexpected property 'timeoutSeconds'" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 11:53:52 -07:00
hailin	64499a5d86	feat(dingtalk): 小龙虾招募全语音/文字引导流程 + OAuth 一键授权卡片 ## 功能说明用户通过语音或文字说「帮我招募一只小龙虾」，iAgent 全程引导完成 OpenClaw 实例创建 + 钉钉 OAuth 一键授权绑定。 ## 核心设计 - 语音场景 (claude_agent_sdk): Claude 通过 Bash/wget 调用内部 HTTP 端点触发 OAuth，绕开 ToolExecutor 限制，两引擎均兼容 - 文字场景 (claude_api): 使用 initiate_dingtalk_binding 自定义工具，通过 uiEvent 机制传递 OAuth URL ## agent-service 变更 - agent-engine.port.ts: EngineStreamEvent 联合类型新增 oauth_prompt - allowed-tools-resolver.service.ts: initiate_dingtalk_binding 加入 ALL_SDK_TOOLS / admin / operator 工具白名单 - tool-executor.ts: 新增 executeInitiateDingTalkBinding()，调用内部 oauth/init 端点获取 OAuth URL，返回 uiEvent - claude-api-engine.ts: 在 tool_result 之后检查 result.uiEvent 并 yield 出去；buildToolDefinitions 注册 initiate_dingtalk_binding schema - system-prompt-builder.ts: - SystemPromptContext 新增 sessionId? 字段 - 语音 session (sessionId 存在) → Step 3 使用 wget 调用 POST /sessions/{sessionId}/dingtalk/oauth-trigger（两引擎通用） - 文字 session (无 sessionId) → Step 3 调用 initiate_dingtalk_binding 工具（claude_api 专用） - voice-session.controller.ts: - 注入 AgentStreamGateway / DingTalkRouterService / AgentInstanceRepository - startVoiceSession: 提前确定 sessionId，在 build() 前传入，使系统提示能内嵌正确的端点 URL - 新增 POST :sessionId/dingtalk/oauth-trigger — 无 JWT（内部端点，由 Claude Bash 工具调用），sessionId 作为能力令牌；生成 OAuth URL 并通过 gateway.emitStreamEvent 直接推送 oauth_prompt 事件到 WS 流 ## voice-agent 变更 - agent.py: 构造 AgentServiceLLM 时传入 room=ctx.room - agent_llm.py: - __init__ 增加 room 参数，存储为 self._room - 新增 _publish_oauth_prompt(evt_data): null-safe，通过 LiveKit publish_data(topic="oauth_prompt") 推送到 Flutter - _do_inject_voice / _do_inject / _do_stream_voice / _do_stream: 处理 oauth_prompt 事件 → asyncio.create_task(_publish_oauth_prompt) - 替换已弃用的 asyncio.ensure_future / get_event_loop().create_task → asyncio.create_task（Python 3.10+ 兼容） ## Flutter 变更 - agent_call_page.dart: DataReceivedEvent 监听 topic="oauth_prompt"，解析 url/instanceName，弹出 _showOAuthBottomSheet（深色主题，🦞 图标，「立即授权」按钮 launchUrl externalApplication） - stream_event.dart: 新增 OAuthPromptEvent(url, instanceId, instanceName) - stream_event_model.dart: toEntity() 新增 'oauth_prompt' case - chat_message.dart: MessageType 枚举新增 oauthPrompt - chat_providers.dart: _handleStreamEvent 新增 OAuthPromptEvent case，生成 type=oauthPrompt 的 ChatMessage（metadata 含 url/instanceName） - chat_page.dart: 新增 oauthPrompt 时间线节点 + _OAuthPromptCard 组件（「立即授权」按钮，launchUrl externalApplication）；import url_launcher ## 修复的关键 Bug 1. [严重] initiate_dingtalk_binding 只对 claude_api 有效，语音默认用 claude_agent_sdk → 新 wget 端点两引擎均可用 2. [严重] 文字聊天页面不处理 oauth_prompt 事件（静默丢弃）→ 补全 Flutter 4 处代码（entity/model/provider/page） 3. [中] _publish_oauth_prompt 缺 local_participant null 检查 → 已修复 4. [轻] asyncio.ensure_future / get_event_loop() 弃用警告 → 已修复 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 11:22:06 -07:00
hailin	3d626aebb5	feat(dingtalk): OAuth one-tap binding + voice tool + public Kong route - DingTalk binding UX replaced with OAuth one-tap flow: - GET /api/v1/agent/channels/dingtalk/oauth/init returns OAuth URL - GET /api/v1/agent/channels/dingtalk/oauth/callback (public, no JWT) exchanges code+state for openId, saves binding, returns HTML page - oauthStates Map with 10-min TTL; state validated before exchange - msg.senderId (openId) aligned with OAuth openId for consistent routing - CODE_TTL_MS extended from 5→15 min (fallback code method preserved) - Kong: dingtalk-oauth-public service declared before agent-service so callback path matches without JWT plugin - Voice sessions: use stored session.systemPrompt + voice rules; allowedTools includes Bash so Claude can call internal APIs - Flutter _DingTalkBindSheet: OAuth-first UX with code-based fallback phases: idle→loadingOAuth→waitingOAuth→success + polling every 2s - docker-compose: IT0_BASE_URL env var for agent-service (redirect URI) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 09:09:00 -07:00
hailin	2d0bdbd27f	feat(agent): voice-triggered DingTalk binding + GET instances by user - Add GET /api/v1/agent/instances/user/:userId endpoint so Claude can look up the caller's agent instances without knowing the ID upfront - Update SystemPromptBuilder DingTalk section with centralized binding flow (one-time code via iAgent DingTalk bot, no per-instance creds) - VoiceSessionController.startVoiceSession now extracts userId from JWT and builds a full iAgent system prompt (userId + DingTalk instructions) so Claude knows who is speaking and how to call the binding API - VoiceSessionManager.executeTurn now uses the session's stored system prompt (base context + voice rules) and allows the Bash tool so Claude can call internal APIs via wget during voice conversations User flow: speak "帮我绑定钉钉" → Claude lists instances → generates code via POST /api/v1/agent/channels/dingtalk/bind/:id → speaks code letter-by-letter → user sends code in DingTalk → binding completes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 08:49:13 -07:00
hailin	db0e1f1439	fix(dingtalk): robustness pass — 5 bugs fixed, stability 10/10 Critical fixes: - ws.on('message') fully wrapped in try/catch — uncaught exception in wsSend() no longer propagates to EventEmitter boundary and crashes process - wsSend() helper: checks readyState === OPEN before send(), never throws - Stale-WS guard: close/message events from old WS ignored after reconnect (ws !== this.ws check); terminateCurrentWs() closes old WS before new one - Queue tail: .catch(() => {}) appended to guarantee promise always resolves, preventing permanently dead queue tail from silently dropping future tasks - DISCONNECT frame handler: force-close + reconnect immediately High fixes: - sessionWebhookExpiredTime unit auto-detection: values < 1e11 treated as seconds (×1000), values >= 1e11 treated as ms — prevents always-blocked reply - httpsPost response capped at 256 KB to prevent memory spike on bad response Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 08:22:08 -07:00
hailin	8751c85881	feat(dingtalk): unified DingTalk bot router with binding flow - Add DingTalkRouterService: maintains single DingTalk Stream WS connection, handles binding codes, routes messages to agent containers - Add AgentChannelController: POST bind/:id, GET status/:id, POST unbind/:id - Add findByDingTalkUserId() to AgentInstanceRepository - Add dingTalkUserId field to AgentInstance entity + migration 011 - Register DingTalkRouterService + AgentChannelController in AgentModule - Add IT0_DINGTALK_CLIENT_ID/SECRET env vars to docker-compose.yml - Flutter: DingTalk bind UI in _InstanceCard (bottom sheet with code display, countdown, auto-poll, open DingTalk deep link, bound badge) Robustness improvements in DingTalkRouterService: - Concurrent connect guard (connecting flag) - Periodic cleanup timer for dedup/rateWindows/bindingCodes maps - Non-text message graceful reply - Empty senderStaffId guard - serverHost null guard before bridge call - unref() cleanup timers from event loop Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 08:12:27 -07:00
hailin	5ec6f113cd	feat(agent): DingTalk channel binding support in instance controller + system prompt - agent-instance.controller.ts: accept dingTalkClientId/dingTalkClientSecret in POST /instances body, forward to deploy service - system-prompt-builder.ts: add DingTalk 5-step binding guide for iAgent so the AI can walk users through connecting their DingTalk account Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 05:12:13 -07:00
hailin	90f11fc572	fix(agent): pass IT0_AGENT_SERVICE_URL env var to openclaw container supervisord uses %(ENV_IT0_AGENT_SERVICE_URL)s expansion which fails if the var is not present, crashing the entire supervisor process. Add AGENT_SERVICE_PUBLIC_URL config and inject it via docker run -e. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 05:01:55 -07:00
hailin	3790284bc9	fix(agent): use wget instead of curl for internal API calls (curl not in container) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 03:29:53 -07:00
hailin	b8979d521e	fix(agent): AgentInstanceRepository use DataSource directly, not TenantAwareRepository agent_instances is in public schema — no tenant context needed. Fixes 'Tenant context not initialized' when iAgent calls internal API via Bash. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 03:19:01 -07:00
hailin	5c5c365736	chore(agent): add empty prisma dir to fix Docker build COPY step Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 03:10:19 -07:00
hailin	49ad47cf59	fix(agent): non-null assertion for serverHost/sshUser in deployToUserServer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 03:07:42 -07:00
hailin	b87cebf465	feat(agent): inject userId into system prompt + fix agent-instance nullable columns - SystemPromptBuilder: add userId/userEmail to context, expose internal API curl commands for OpenClaw creation - agent.controller.ts: extract userId from JWT, build system prompt via SystemPromptBuilder so iAgent knows current user - agent.module.ts: register SystemPromptBuilder as provider - agent-instance.entity.ts: make serverHost/sshUser nullable (pool mode doesn't set these upfront) - DB: ALTER TABLE agent_instances DROP NOT NULL on server_host/ssh_user Now iAgent can create 小龙虾 instances autonomously when user asks in natural language. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-08 03:05:15 -07:00
hailin	d56486a4aa	fix(agent-service): use hailin168/openclaw-bridge Docker Hub image The it0hub org doesn't exist on Docker Hub. Switch to hailin168/openclaw-bridge:latest which was built and pushed from openclaw source + IT0 bridge. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-07 12:21:27 -08:00
hailin	ad46e45181	fix(agent-service): remove duplicate remove() override conflicting with TenantAwareRepository base Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-07 11:29:40 -08:00
hailin	2086eb8109	feat(openclaw): Phase 2 — heartbeat endpoint + iAgent OpenClaw deployment awareness - agent-instance.controller: POST :id/heartbeat — bridge calls this every 60s; auto-transitions status from deploying→running when gateway is confirmed connected - system-prompt-builder: teach iAgent about OpenClaw deployment capability: create/list/stop/remove instance API endpoints, when to trigger deployment, and what to tell users about channel connectivity (Telegram/WhatsApp etc.) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-07 11:17:35 -08:00
hailin	7d5840c245	feat(openclaw): Phase 1 — server pool + agent instance deployment infrastructure ## inventory-service - New: pool_servers table (public schema, platform-admin managed) - New: PoolServer entity, PoolServerRepository, PoolServerController - CRUD endpoints at /api/v1/inventory/pool-servers - Internal /deploy-creds endpoint (x-internal-api-key protected) for SSH key retrieval - increment/decrement endpoints for capacity tracking ## agent-service - New: agent_instances table (tenant schema) - New: AgentInstance entity, AgentInstanceRepository, AgentInstanceController - New: AgentInstanceDeployService — SSH-based docker deployment - Queries pool server availability from inventory-service - AES-256 encrypts OpenClaw gateway token at rest - Allocates host ports in range 20000-29999 - Fires docker run for it0hub/openclaw-bridge:latest - Async deploy with error capture - Added ssh2 dependency for SSH execution - Added INVENTORY_SERVICE_URL, INTERNAL_API_KEY, VAULT_MASTER_KEY to docker-compose ## openclaw-bridge (new package) - packages/openclaw-bridge/ — custom Docker image - Two processes via supervisord: OpenClaw gateway + IT0 Bridge (Node.js) - IT0 Bridge exposes REST API on port 3000: GET /health, GET /status, POST /task, GET /sessions, GET /metrics - Connects to OpenClaw gateway at ws://127.0.0.1:18789 via WebSocket RPC - Sends heartbeat to IT0 agent-service every 60s - Dockerfile: multi-stage build (openclaw source + bridge TS compilation) ## Web Admin - New: /server-pool page — list/add/edit/delete pool servers with capacity bars - New: /openclaw-instances page — cross-tenant instance monitoring with status filter - Sidebar: added 服务器池 (Database icon) + OpenClaw 实例 (Boxes icon) to platform_admin nav ## Flutter App - my_agents_page: rewritten to show real AgentInstance data from /api/v1/agent/instances - Added AgentInstance model with status-driven UI (running/deploying/stopped/error) - Status badges with color coding + spinner for deploying state - Summary chips showing running vs stopped counts - api_endpoints.dart: added agentInstances endpoint ## Design docs - OPENCLAW_INTEGRATION_PLAN.md: complete architecture document with all confirmed decisions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-07 11:11:21 -08:00
hailin	4c7c05eb37	feat(stt): support auto language detection for mixed Chinese-English input - Flutter: language='auto' omits the language field → backend receives none - Backend: no language field → passes undefined to STT service - STT service: language=undefined → omits language param from Whisper request - Whisper auto-detects language per utterance when no hint is provided Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 08:13:26 -08:00
hailin	947a47869e	fix(agent-service): use https.request for Whisper STT to bypass self-signed cert Node 18 native fetch (undici) ignores https.Agent, causing fetch failed on the self-signed proxy at 67.223.119.33:8443. Switch to https.request with rejectUnauthorized: false which works reliably. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 07:51:37 -08:00
hailin	73eb4350fb	fix(agent-service): strip /v1 suffix from OPENAI_BASE_URL in STT service OPENAI_BASE_URL=https://67.223.119.33:8443/v1 already includes /v1, so the URL was being built as .../v1/v1/audio/transcriptions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 07:27:16 -08:00
hailin	15ee296fcd	fix(agent-service): add multer as explicit runtime dependency multer was only transitively available; pnpm strict mode blocks it. Also adds @types/multer for TypeScript compilation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 07:10:22 -08:00
hailin	07783ccad2	fix(agent-service): add @types/multer to devDependencies Fixes TS2307 build error: Cannot find module 'multer' or its type declarations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 07:03:54 -08:00
hailin	2182149c4c	feat(chat): voice-to-text fills input box instead of auto-sending - Add POST /api/v1/agent/transcribe endpoint (STT only, no agent trigger) - Add transcribeAudio() to chat datasource and provider - VoiceMicButton now fills the text input field with transcript; user reviews and sends manually - Add OPENAI_API_KEY/OPENAI_BASE_URL to agent-service in docker-compose Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 07:01:39 -08:00
hailin	a2af76bcd7	feat(agent-service): add voice message endpoint with Whisper STT and async interrupt New endpoint: POST /api/v1/agent/sessions/:sessionId/voice-message - Accepts multipart/form-data audio file (any format Whisper supports) - Transcribes via OpenAI Whisper API (routed through existing proxy) - If a task is currently running in the session → hard-interrupts it first (same cancel+inject pattern as text inject, triggered by voice command) - Otherwise → starts a fresh task with the transcript - Returns { sessionId, taskId, transcript } so client can subscribe to WS stream This enables WhatsApp-style push-to-talk and doubles as an async voice interrupt into any active agent workflow, bypassing the need for speaker diarization (whoever presses record owns the message). New files: infrastructure/stt/openai-stt.service.ts — OpenAI Whisper client, manually builds multipart/form-data, supports self-signed proxy cert Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 03:12:03 -08:00
hailin	d097c64c81	feat(voice): add per-turn interrupt support to VoiceSessionManager Implements a two-level abort controller design to support real-time interruption when the user speaks while the agent is still responding: sessionAbortController (session-scoped) - Created once when startSession() is called - Fired only by terminateSession() (user hangs up) - Propagated into each turn via addEventListener turnAbort (per-turn, stored as handle.currentTurnAbort) - Created fresh at the start of each executeTurn() call - Stored on the VoiceSessionHandle so injectMessage() can abort it - When a new inject arrives while a turn is running, injectMessage() calls turnAbort.abort() BEFORE enqueuing the new message Interruption flow: 1. User speaks mid-response → LiveKit stops TTS playback (client-side) 2. STT utterance → POST voice/inject → injectMessage() fires 3. handle.currentTurnAbort.abort() called → sets aborted flag 4. for-await loop checks turnAbort.signal.aborted on next SDK event → break 5. catch block NOT reached (break ≠ exception) → no error event emitted 6. finally block saves partial text with "[中断]" suffix to history 7. New message dequeued → fresh executeTurn() starts immediately Why no "Agent error" message plays to the user: - break exits the for-await loop silently, not via exception - The catch block's error-event emission is guarded by err?.name !== 'AbortError' AND requires an actual exception; a plain break never enters catch - Empty or partial responses are filtered by `if response:` in agent.py Also update module-level JSDoc with full architecture explanation covering the long-lived run loop design, two-level abort hierarchy, tenant context injection pattern, and SDK session resume across turns. Update agent.py module docstring to document voice session lifecycle and interruption flow for future maintainers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-04 04:25:57 -08:00
hailin	635cca18fa	feat(voice): long-lived agent session with proper hangup termination Replace the per-turn POST /tasks approach for voice calls with a long-lived agent run loop tied to the call lifecycle: agent-service: - Add AsyncQueue<T> utility for blocking message relay - Add VoiceSessionManager: spawns one background run loop per voice call, accepts injected messages, terminates cleanly on hangup - Add VoiceSessionController with 3 endpoints: POST /api/v1/agent/sessions/voice/start (call start) POST /api/v1/agent/sessions/:id/voice/inject (each speech turn) DELETE /api/v1/agent/sessions/:id/voice (user hung up) - Register VoiceSessionManager + VoiceSessionController in agent.module.ts voice-agent: - AgentServiceLLM: add start_voice_session(), terminate_voice_session(), inject_text_message() (voice/inject-aware), _do_inject_voice() - AgentServiceLLMStream._run(): use voice/inject path when voice session is active; fall back to per-task POST for text-chat / non-SDK engines - entrypoint(): call start_voice_session() after session.start(); register _on_room_disconnect that calls terminate_voice_session() so the agent is always killed when the user hangs up Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-04 04:01:02 -08:00
hailin	6ca8aab243	fix(agent-service): store proper title in session metadata, exclude systemPrompt from list API Two issues fixed: 1. agent.controller.ts — on the FIRST task of each session, write title+voiceMode into session.metadata so the client can display a meaningful conversation title: - Text sessions: metadata.title = first 40 chars of user prompt - Voice sessions: metadata.title = '' + metadata.voiceMode = true (Flutter renders these as '语音对话 M/D HH:mm') titleSet flag prevents overwriting the title on subsequent turns of the same session. 2. session.controller.ts — listSessions() now returns a DTO instead of the raw entity. systemPrompt is an internal engine instruction and is explicitly excluded from the response. The client receives { id, status, engineType, metadata, createdAt, updatedAt }.	2026-03-04 02:39:47 -08:00
hailin	9ed80cd0bc	feat: implement complete commercial monetization loop (Phases 1-4) ## Phase 1 - Token Metering + Quota Enforcement ### Usage Tracking - agent-service: add UsageRecord entity (per-tenant schema) tracking inputTokens/outputTokens/costUsd per AI task - Modify all 3 AI engines (claude-api, claude-code-cli, claude-agent-sdk) to emit separate input/output token counts in the `completed` event - claude-api-engine: costUsd = (input3 + output15) / 1,000,000 (claude-sonnet-4-5 pricing: $3/MTok in, $15/MTok out) - agent.controller: persist UsageRecord and publish `usage.recorded` event to Redis Streams on every task completion (non-blocking) - shared/events: new events UsageRecordedEvent, SubscriptionChangedEvent, QuotaExceededEvent, PaymentReceivedEvent ### Quota Enforcement - TenantInfo: add maxServers, maxUsers, maxStandingOrders, maxAgentTokensPerMonth fields - TenantContextMiddleware: rewritten to query public.tenants table for real quota values; 5-min in-memory cache; plan-based fallback on error - TenantContextService: getTenant() returns null instead of throwing; added getTenantOrThrow() for strict callers - inventory-service/server.controller: 429 when maxServers exceeded - ops-service/standing-order.controller: 429 when maxStandingOrders exceeded - auth-service/auth.service: 429 when maxUsers exceeded - 002-create-tenant-schema-template.sql: add usage_records table ## Phase 2 - billing-service (New Microservice, port 3010) ### Domain Layer (public schema, all UUIDs) Entities: Plan, Subscription, Invoice, InvoiceItem, Payment, PaymentMethod, UsageAggregate Domain services: - SubscriptionLifecycleService: full state machine (trialing -> active -> past_due -> cancelled/expired); upgrades immediate, downgrades at period end - InvoiceGeneratorService: monthly invoice = base fee + overage charges; proration item for mid-cycle upgrades - OverageCalculatorService: (totalTokens - includedTokens) * overageRate ### Infrastructure (all repos use DataSource directly, NOT TenantAwareRepository) - PlanRepository, SubscriptionRepository, InvoiceRepository (atomic transaction for invoice+items), PaymentRepository (payments + methods), UsageAggregateRepository (UPSERT via ON CONFLICT for atomic accumulation) ### Application Use Cases - CreateSubscriptionUseCase: called on tenant registration - ChangePlanUseCase: upgrade (immediate + proration) or downgrade (scheduled) - CancelSubscriptionUseCase: immediate or at-period-end - GenerateMonthlyInvoiceUseCase: cron target (1st of month 00:05 UTC); generates invoices, renews periods, applies scheduled downgrades - AggregateUsageUseCase: Redis Streams consumer group billing-service, upserts monthly usage aggregates from usage.recorded events - CheckTokenQuotaUseCase: hard limit enforcement per plan - CreatePaymentSessionUseCase + HandlePaymentWebhookUseCase ### REST API - GET /api/v1/billing/plans - GET/POST /api/v1/billing/subscription (+ /upgrade, /cancel) - GET /api/v1/billing/invoices (paginated) - GET /api/v1/billing/invoices/:id - POST /api/v1/billing/invoices/:id/pay - GET /api/v1/billing/usage/current + /history - CRUD /api/v1/billing/payment-methods - POST /api/v1/billing/webhooks/{stripe,alipay,wechat,crypto} ### Plan Seed (auto on startup via PlanSeedService) - free: $0/mo, 100K tokens, no overage, hard limit 100% - pro: $49.99/mo, 1M tokens, $8/MTok, hard limit 150% - enterprise: $199.99/mo, 10M tokens, $5/MTok, no hard limit ## Phase 3 - Payment Provider Integration ### PaymentProviderRegistry (Strategy Pattern, mirrors EngineRegistry) All providers use @Optional() injection; unconfigured providers omitted - StripeProvider: PaymentIntent API; webhook via stripe.webhooks.constructEvent - AlipayProvider: alipay-sdk; Native QR (precreate); RSA2 signature verify - WeChatPayProvider: v3 REST; Native Pay code_url; AES-256-GCM decrypt; HMAC-SHA256 request signing and webhook verification - CryptoProvider: Coinbase Commerce; hosted checkout; HMAC-SHA256 verify ### WebhookController All 4 webhook endpoints are public (no JWT) for payment provider callbacks. rawBody: true enabled in main.ts for signature verification. ## Infrastructure Changes - docker-compose.yml: billing-service container (port 13010); added as dependency of api-gateway - kong.yml: /api/v1/billing routes (JWT); /api/v1/billing/webhooks (public) - 005-create-billing-tables.sql: 7 billing tables + invoice sequence + ALTER tenants to add quota columns - run-migrations.ts: 005 runs as part of shared schema step ## Phase 4 - Frontend ### Web Admin (Next.js) New pages: - /billing: subscription card + token usage bar + warning banner + invoices - /billing/plans: comparison grid with USD/CNY toggle + upgrade/downgrade flow - /billing/invoices: paginated table with Pay Now button Sidebar: Billing group (CreditCard icon, 3 sub-items) i18n: billing keys added to en + zh sidebar translations ### Flutter App New feature module it0_app/lib/features/billing/: - BillingOverviewPage: plan card + token LinearProgressIndicator + latest invoice + upgrade button - BillingProvider (FutureProvider): parallel fetch subscription/quota/invoice Settings page: "订阅与用量" entry card Router: /settings/billing sub-route Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-03 21:09:17 -08:00
hailin	7fb0d1de95	refactor: remove Speechmatics STT integration entirely, default to OpenAI - Delete speechmatics_stt.py plugin - Remove speechmatics branch from voice-agent entrypoint - Remove livekit-plugins-speechmatics dependency - Change default stt_provider to 'openai' in entity, controller, and UI - Remove SPEECHMATICS_API_KEY from docker-compose.yml - Remove speechmatics option from web-admin settings dropdown Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 04:58:38 -08:00
hailin	e32a3a9800	fix: use @TenantId() decorator in VoiceConfigController for JWT tenant extraction Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 22:30:37 -08:00
hailin	f9c47de04b	feat: add STT provider switching (OpenAI ↔ Speechmatics) in settings - Add VoiceConfig entity/repo/service/controller in agent-service for per-tenant STT provider persistence (default: speechmatics) - Add Speechmatics STT plugin in voice-agent with livekit-plugins-speechmatics - Modify voice-agent entrypoint for 3-way STT selection: metadata > agent-service config > env var fallback - Add "Voice" section in web-admin settings page with STT provider dropdown - Add i18n translations (en/zh) for voice settings - Add SPEECHMATICS_API_KEY env var in docker-compose Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 22:13:18 -08:00
hailin	da17488389	feat: voice mode event filtering — skip tool/thinking events for Agent SDK 1. Remove on_enter greeting entirely (no more race condition) 2. voice-agent sends voiceMode: true when engine_type is claude_agent_sdk 3. AgentController.runTaskStream() filters thinking, tool_use, tool_result events in voice mode — only text, completed, error reach the client 4. Detailed logging: each event logged with [FILTERED-voice] tag when skipped Claude API mode is completely unaffected (voiceMode defaults to false). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 02:56:41 -08:00
hailin	4987cad881	fix: increase body parser limit to 50mb for large PDF uploads Claude API supports up to 32MB PDFs; base64 encoding adds ~33% overhead. 50mb body limit covers the maximum single-document upload case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 05:35:43 -08:00
hailin	c9367ee22a	fix: PDF attachments sent as document blocks instead of image blocks PDF files were incorrectly wrapped as type:'image' content blocks, causing Claude API to reject them as "Invalid image data". - conversation-context.service: check mediaType for application/pdf, use type:'document' block (Anthropic native PDF support) instead - claude-agent-sdk-engine: detect both 'image' and 'document' blocks when deciding to build multimodal SDK prompt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 05:27:41 -08:00
hailin	2c657e2b4c	fix: use NestJS native useBodyParser instead of direct express import The direct `import * as express from 'express'` caused a MODULE_NOT_FOUND error in the Docker production image since express is only available as a transitive dependency via @nestjs/platform-express. Use NestExpressApplication.useBodyParser() which is the official NestJS API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 04:01:54 -08:00
hailin	b9c3bfdf91	feat: add multimodal image support to Claude Agent SDK engine - SDK engine now constructs AsyncIterable<SDKUserMessage> with image content blocks when attachments are present in conversationHistory, using the SDK's native multimodal prompt format - CLI engine logs a warning when images are detected, since the `-p` flag only accepts text (upstream Claude CLI limitation) - Both SDK and API engines now fully support multimodal image input Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 03:38:59 -08:00
hailin	e4c2505048	feat: add multimodal image input with streaming markdown optimization Two major features in this commit: 1. Streaming Markdown Rendering Optimization - Replace deprecated flutter_markdown with gpt_markdown (active, AI-optimized) - Real-time markdown rendering during streaming (was showing raw syntax) - Solid block cursor (█) instead of AnimationController blink - 80ms token throttle buffer reducing rebuilds from per-token to ~12.5/sec - RepaintBoundary isolation for markdown widget repaints - StreamTextWidget simplified from StatefulWidget to StatelessWidget 2. Multimodal Image Input (camera + gallery + display) - Flutter: image_picker for gallery/camera, base64 encoding, attachment preview strip with delete, thumbnails in sent messages - Data layer: List<String>? → List<Map<String, dynamic>>? for structured attachment payloads through datasource/repository/usecase - ChatAttachment model with base64Data, mediaType, fileName - ChatMessage entity + ChatMessageModel both support attachments field - Backend DTO, Entity (JSONB), Controller, ConversationContextService all extended to receive, store, and reconstruct Anthropic image content blocks in loadContext() - Claude API engine skips duplicate user message when history already ends with multimodal content blocks - NestJS body parser limit raised to 10MB for base64 image payloads - Android CAMERA permission added to manifest - Image.memory uses cacheWidth/cacheHeight for memory efficiency - Max 5 images per message enforced in UI Data flow: ImagePicker → base64Encode → ChatAttachment → POST body → DB (JSONB) → loadContext → Anthropic image content blocks → Claude API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 03:24:17 -08:00
hailin	50dbb641a3	fix: comprehensive hardening of agent task cancel/inject/approve flows 6 rounds of systematic audit identified and fixed 14 bugs across backend controller and Flutter client: ## Backend (agent.controller.ts) Security & Tenant Isolation: - Add @TenantId + ForbiddenException check to cancelTask, injectMessage, approveCommand — all 4 write endpoints now enforce tenant isolation - Add tenantId check on session reuse in executeTask to prevent cross-tenant session hijacking Architecture & Correctness: - Extract shared runTaskStream() from inline fire-and-forget block, used by both executeTask and injectMessage to reduce duplication - Use session.engineType (not getActiveEngine()) in cancelTask, injectMessage, approveCommand — fixes wrong-engine-cancel when global engine config is switched after task creation - Add concurrent task prevention: executeTask checks for existing RUNNING task on same session and cancels it before starting new one - Add runningTasks Map to track task promises, awaitTaskCleanup() helper with 3s timeout for inject to wait for partial text save - captureSdkSessionId() captures SDK session ID into metadata without DB save (callers persist), preventing fire-and-forget race Cancel/Reject Improvements: - cancelTask: idempotent (returns early if already CANCELLED/COMPLETED), session stays 'active' (was 'cancelled'), emits cancelled WS event - approveCommand reject: session stays 'active' (was 'cancelled'), now emits cancelled WS event so Flutter stream listeners clean up - approveCommand approved: collect text events and save assistant response to conversation history on completion (was missing) Minor: - task.result! non-null assertion → task.result ?? 'Unknown error' - Add findRunningBySessionId() to TaskRepository ## Flutter API Contract Fix: - approveCommand: route changed from /api/v1/ops/approvals/:id/approve to /api/v1/agent/tasks/:id/approve with {approved: true} body - rejectCommand: route changed from /api/v1/ops/approvals/:id/reject to /api/v1/agent/tasks/:id/approve with {approved: false} body Resource Management: - ChatNotifier.dispose() now disconnects WebSocket to prevent connection leak when navigating away from chat Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 22:20:46 -08:00

1 2

81 Commits