Add CRITICAL note and clear IF/ELSE branching so Claude never calls
dingtalk endpoints for feishu binding or vice versa.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## Changes
### openclaw-bridge: POST /skill-inject
- New endpoint writes SKILL.md to ~/.openclaw/skills/{name}/ inside the container volume
- OpenClaw gateway file watcher picks it up within 250ms (no restart needed)
- Optionally calls sessions.delete RPC after write so the next user message starts
a fresh session that loads the new skill directory immediately (zero-downtime)
- Path traversal guard on skill name (rejects names with / .. \)
- OPENCLAW_HOME env var configurable (default: /home/node/.openclaw)
### agent-service: POST /api/v1/agent/instances/:id/skills
- New endpoint in AgentInstanceController proxies skill injection requests to the
instance's bridge (http://{serverHost}:{hostPort}/skill-inject)
- Guards: instance must be 'running', serverHost/hostPort must be set, content ≤ 100KB
- iAgent calls this internally (localhost:3002) via Python urllib — no Kong auth needed
- sessionKey format for DingTalk users: "agent:main:dt-{dingTalkUserId}"
### agent-service: remove dead SkillManagerService
- Deleted skill-manager.service.ts (file-system .md loader, never called by anything)
- Removed from agent.module.ts provider list
- The live skill path is ClaudeAgentSdkEngine.loadTenantSkills() which reads directly
from the DB (it0_t_{tenantId}.skills) at task-execution time
### agent-service: clean up SystemPromptBuilder
- Removed unused skills?: string[] from SystemPromptContext (was never populated)
- Added clarifying comment: SDK engine handles skill injection, not this builder
## DB
- Inserted iAgent meta-skill "为小龙虾安装技能" into it0_t_default.skills
(id: 79ac23ed-78c2-4d5f-8652-a99cf5185b61)
- Content instructs iAgent to: query user instances → generate SKILL.md → call
POST /api/v1/agent/instances/:id/skills via Python urllib heredoc
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Bridge: tag isTimeout=true in timeout callbacks for semantic error routing
- Agent-service: show "⏳ 还在努力想呢" progress batchSend after 25s silence
- Agent-service: queue position feedback ("前面还有 N 条") via sessionWebhook
- Agent-service: buildErrorReply() maps timeout/disconnect/abort to distinct msgs
- Agent-service: instance status hints (stopped/starting/error) with action guidance
- Agent-service: all user-facing strings rewritten for conversational, friendly tone
- Agent-channel: pass isTimeout from bridge callback through to resolveCallbackReply
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause of "Bridge call failed" errors: bridge /task endpoint defaults
to 25s agent reply timeout, but LLM calls through the iConsulting gateway
can take 30-60s. Fix: pass timeoutSeconds=55 explicitly in POST body.
Also add batchSend fallback in routeToAgent: if the sessionWebhook has
expired by the time the LLM replies (user sent a message, LLM took >30s,
webhook window closed), the reply is now sent via proactive batchSend
using senderStaffId instead of being silently dropped.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the voice agent triggers DingTalk OAuth, the user leaves the app
to authorize in DingTalk/browser, causing the LiveKit participant to
disconnect. The voice-agent then calls DELETE /voice to terminate the
session — but the user intends to return after completing OAuth.
Fix: mark the session as "oauth_pending" in VoiceSessionController when
oauth-trigger fires. If terminateVoiceSession is called while the flag
is active (10-min grace), suppress the terminate and return 200 OK so
the voice-agent exits cleanly. The session stays alive; when the user
returns to the voice screen, voice/start + inject auto-resume it.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two binding paths store different DingTalk ID types:
- OAuth binding stores staffId (resolved via unionId→userId at auth time)
- Code binding stores senderId ($:LWCP_v1:$... format from bot message)
DingTalk Stream API senderId != OAuth openId (different encodings), so
primary lookup by senderId always missed OAuth-bound instances, requiring
a fallback every time. Reverse the lookup order: try senderStaffId first
(direct hit for OAuth binding), fall back to senderId (code binding).
Also add MAX_RESPONSE_BYTES cap to httpPostJson — previously uncapped
unlike the DingTalk API helpers which already had the 256KB guard.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OpenClaw daemon checks ANTHROPIC_API_KEY env var on startup. We were passing
CLAUDE_API_KEY which openclaw ignores, so it fell back to auth-profiles.json
containing the raw Anthropic key, causing 401 from iConsulting LLM gateway.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OpenClaw reads API key from auth-profiles.json. Was writing raw Anthropic key
sk-ant-api03-... which gateway doesn't recognize. Must use effectiveApiKey
(sk-gw-oc-... gateway key) so authentication with iConsulting LLM gateway succeeds.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After container starts, sed-replace api.anthropic.com with iConsulting LLM gateway URL
in all models.generated.js files (ANTHROPIC_BASE_URL env alone is not enough since
baseUrl is hardcoded). Also create missing AGENTS.md template symlink so OpenClaw
does not 500 on workspace init.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs fixed:
1. findByDingTalkUserId now filters status != 'removed' so a re-bound new instance
is not shadowed by an old removed one with the same DingTalk user ID.
2. When an agent is deleted (removed), its dingtalkUserId is cleared so the
DingTalk ID is freed for reuse by the next binding.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OpenClaw runs as node user (uid 1000) but the host directory was created as root,
causing EACCES when the container tried to create /home/node/.openclaw/workspace.
Now mkdir workspace/ and chown -R 1000:1000 before starting the container.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Text sessions were not passing sessionId to SystemPromptBuilder, causing
Claude to use the `initiate_dingtalk_binding` custom tool (claude_api only).
When the engine is claude_agent_sdk, this tool does not exist → 404.
Fix: pass session.id as sessionId to systemPromptBuilder.build() in
agent.controller.ts. Claude will now use the wget oauth-trigger endpoint
for ALL session types (text and voice), which works with every engine.
Also: store userId (staffId) as the DingTalk binding ID when resolvable,
falling back to openId. Bot messages deliver senderStaffId which matches
userId, not openId — this prevents the "binding not found" routing failure.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Problem: sendGreeting() was passing openId as `userIds` to batchSend, but
the API requires the enterprise staffId (userId). This caused HTTP 400
"staffId.notExisted" for every OAuth-bound greeting.
Fix:
1. completeOAuthBinding now resolves unionId → userId via
oapi.dingtalk.com/topapi/user/getbyunionid with corp app token.
Non-fatal: if the user has no enterprise context, greeting is skipped
with a clear log explaining why (no Contact.User.Read permission or
user is not an enterprise member).
2. sendGreeting accepts userId (staffId) and openId separately; uses
the correct staffId for batchSend. If userId is undefined, emits a
WARN and skips (user gets greeting on first message instead).
3. routeToAgent now tries senderStaffId as fallback if senderId lookup
misses — handles edge cases where DingTalk delivers staffId in senderId.
4. Added detailed logging: all three IDs (openId, unionId, userId) are
logged at binding time so future issues are immediately diagnosable.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Flutter:
- my_agents_page: refresh agent list on every My Agents tab tap
(ref.invalidate in ScaffoldWithNav.onDestinationSelected)
- chat_page + my_agents_page: activate AudioSession before launching OAuth
browser so iOS keeps network connections alive in background; deactivate
when app resumes or binding polling completes
agent-service deploy:
- Write openclaw.json with correct gateway token and auth-profiles.json
with API key BEFORE starting the container, so OpenClaw and bridge
always agree on the auth token (fixes token_mismatch on new deployments)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
openclaw-bridge:
- index.ts: /task endpoint now calls chatSendAndWait() with idempotencyKey
(removes broken timeoutSeconds param; uses caller-supplied msgId for dedup)
- openclaw-client.ts: added onEvent() subscription + chatSendAndWait() that
subscribes to 'chat' WS events, waits for state='final' matching runId,
and extracts text from the message payload
dingtalk-router:
- After OAuth binding completes, sends a proactive greeting to the user via
DingTalk batchSend API (/v1.0/robot/oToMessages/batchSend) introducing the
agent by name and explaining what it can do
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- DingTalk binding UX replaced with OAuth one-tap flow:
- GET /api/v1/agent/channels/dingtalk/oauth/init returns OAuth URL
- GET /api/v1/agent/channels/dingtalk/oauth/callback (public, no JWT)
exchanges code+state for openId, saves binding, returns HTML page
- oauthStates Map with 10-min TTL; state validated before exchange
- msg.senderId (openId) aligned with OAuth openId for consistent routing
- CODE_TTL_MS extended from 5→15 min (fallback code method preserved)
- Kong: dingtalk-oauth-public service declared before agent-service
so callback path matches without JWT plugin
- Voice sessions: use stored session.systemPrompt + voice rules;
allowedTools includes Bash so Claude can call internal APIs
- Flutter _DingTalkBindSheet: OAuth-first UX with code-based fallback
phases: idle→loadingOAuth→waitingOAuth→success + polling every 2s
- docker-compose: IT0_BASE_URL env var for agent-service (redirect URI)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add GET /api/v1/agent/instances/user/:userId endpoint so Claude can
look up the caller's agent instances without knowing the ID upfront
- Update SystemPromptBuilder DingTalk section with centralized binding
flow (one-time code via iAgent DingTalk bot, no per-instance creds)
- VoiceSessionController.startVoiceSession now extracts userId from JWT
and builds a full iAgent system prompt (userId + DingTalk instructions)
so Claude knows who is speaking and how to call the binding API
- VoiceSessionManager.executeTurn now uses the session's stored system
prompt (base context + voice rules) and allows the Bash tool so Claude
can call internal APIs via wget during voice conversations
User flow: speak "帮我绑定钉钉" → Claude lists instances → generates
code via POST /api/v1/agent/channels/dingtalk/bind/:id → speaks code
letter-by-letter → user sends code in DingTalk → binding completes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Critical fixes:
- ws.on('message') fully wrapped in try/catch — uncaught exception in
wsSend() no longer propagates to EventEmitter boundary and crashes process
- wsSend() helper: checks readyState === OPEN before send(), never throws
- Stale-WS guard: close/message events from old WS ignored after reconnect
(ws !== this.ws check); terminateCurrentWs() closes old WS before new one
- Queue tail: .catch(() => {}) appended to guarantee promise always resolves,
preventing permanently dead queue tail from silently dropping future tasks
- DISCONNECT frame handler: force-close + reconnect immediately
High fixes:
- sessionWebhookExpiredTime unit auto-detection: values < 1e11 treated as
seconds (×1000), values >= 1e11 treated as ms — prevents always-blocked reply
- httpsPost response capped at 256 KB to prevent memory spike on bad response
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- agent-instance.controller.ts: accept dingTalkClientId/dingTalkClientSecret
in POST /instances body, forward to deploy service
- system-prompt-builder.ts: add DingTalk 5-step binding guide for iAgent
so the AI can walk users through connecting their DingTalk account
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
supervisord uses %(ENV_IT0_AGENT_SERVICE_URL)s expansion which fails
if the var is not present, crashing the entire supervisor process.
Add AGENT_SERVICE_PUBLIC_URL config and inject it via docker run -e.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
agent_instances is in public schema — no tenant context needed.
Fixes 'Tenant context not initialized' when iAgent calls internal API via Bash.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- SystemPromptBuilder: add userId/userEmail to context, expose internal API curl commands for OpenClaw creation
- agent.controller.ts: extract userId from JWT, build system prompt via SystemPromptBuilder so iAgent knows current user
- agent.module.ts: register SystemPromptBuilder as provider
- agent-instance.entity.ts: make serverHost/sshUser nullable (pool mode doesn't set these upfront)
- DB: ALTER TABLE agent_instances DROP NOT NULL on server_host/ssh_user
Now iAgent can create 小龙虾 instances autonomously when user asks in natural language.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The it0hub org doesn't exist on Docker Hub. Switch to hailin168/openclaw-bridge:latest
which was built and pushed from openclaw source + IT0 bridge.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- agent-instance.controller: POST :id/heartbeat — bridge calls this every 60s;
auto-transitions status from deploying→running when gateway is confirmed connected
- system-prompt-builder: teach iAgent about OpenClaw deployment capability:
create/list/stop/remove instance API endpoints, when to trigger deployment,
and what to tell users about channel connectivity (Telegram/WhatsApp etc.)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Flutter: language='auto' omits the language field → backend receives none
- Backend: no language field → passes undefined to STT service
- STT service: language=undefined → omits language param from Whisper request
- Whisper auto-detects language per utterance when no hint is provided
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Node 18 native fetch (undici) ignores https.Agent, causing fetch failed
on the self-signed proxy at 67.223.119.33:8443. Switch to https.request
with rejectUnauthorized: false which works reliably.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OPENAI_BASE_URL=https://67.223.119.33:8443/v1 already includes /v1,
so the URL was being built as .../v1/v1/audio/transcriptions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add POST /api/v1/agent/transcribe endpoint (STT only, no agent trigger)
- Add transcribeAudio() to chat datasource and provider
- VoiceMicButton now fills the text input field with transcript;
user reviews and sends manually
- Add OPENAI_API_KEY/OPENAI_BASE_URL to agent-service in docker-compose
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New endpoint: POST /api/v1/agent/sessions/:sessionId/voice-message
- Accepts multipart/form-data audio file (any format Whisper supports)
- Transcribes via OpenAI Whisper API (routed through existing proxy)
- If a task is currently running in the session → hard-interrupts it first
(same cancel+inject pattern as text inject, triggered by voice command)
- Otherwise → starts a fresh task with the transcript
- Returns { sessionId, taskId, transcript } so client can subscribe to WS stream
This enables WhatsApp-style push-to-talk and doubles as an async voice
interrupt into any active agent workflow, bypassing the need for speaker
diarization (whoever presses record owns the message).
New files:
infrastructure/stt/openai-stt.service.ts — OpenAI Whisper client,
manually builds multipart/form-data, supports self-signed proxy cert
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements a two-level abort controller design to support real-time
interruption when the user speaks while the agent is still responding:
sessionAbortController (session-scoped)
- Created once when startSession() is called
- Fired only by terminateSession() (user hangs up)
- Propagated into each turn via addEventListener
turnAbort (per-turn, stored as handle.currentTurnAbort)
- Created fresh at the start of each executeTurn() call
- Stored on the VoiceSessionHandle so injectMessage() can abort it
- When a new inject arrives while a turn is running, injectMessage()
calls turnAbort.abort() BEFORE enqueuing the new message
Interruption flow:
1. User speaks mid-response → LiveKit stops TTS playback (client-side)
2. STT utterance → POST voice/inject → injectMessage() fires
3. handle.currentTurnAbort.abort() called → sets aborted flag
4. for-await loop checks turnAbort.signal.aborted on next SDK event → break
5. catch block NOT reached (break ≠ exception) → no error event emitted
6. finally block saves partial text with "[中断]" suffix to history
7. New message dequeued → fresh executeTurn() starts immediately
Why no "Agent error" message plays to the user:
- break exits the for-await loop silently, not via exception
- The catch block's error-event emission is guarded by err?.name !== 'AbortError'
AND requires an actual exception; a plain break never enters catch
- Empty or partial responses are filtered by `if response:` in agent.py
Also update module-level JSDoc with full architecture explanation covering
the long-lived run loop design, two-level abort hierarchy, tenant context
injection pattern, and SDK session resume across turns.
Update agent.py module docstring to document voice session lifecycle and
interruption flow for future maintainers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the per-turn POST /tasks approach for voice calls with a
long-lived agent run loop tied to the call lifecycle:
agent-service:
- Add AsyncQueue<T> utility for blocking message relay
- Add VoiceSessionManager: spawns one background run loop per voice call,
accepts injected messages, terminates cleanly on hangup
- Add VoiceSessionController with 3 endpoints:
POST /api/v1/agent/sessions/voice/start (call start)
POST /api/v1/agent/sessions/:id/voice/inject (each speech turn)
DELETE /api/v1/agent/sessions/:id/voice (user hung up)
- Register VoiceSessionManager + VoiceSessionController in agent.module.ts
voice-agent:
- AgentServiceLLM: add start_voice_session(), terminate_voice_session(),
inject_text_message() (voice/inject-aware), _do_inject_voice()
- AgentServiceLLMStream._run(): use voice/inject path when voice session
is active; fall back to per-task POST for text-chat / non-SDK engines
- entrypoint(): call start_voice_session() after session.start();
register _on_room_disconnect that calls terminate_voice_session()
so the agent is always killed when the user hangs up
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two issues fixed:
1. agent.controller.ts — on the FIRST task of each session, write title+voiceMode
into session.metadata so the client can display a meaningful conversation title:
- Text sessions: metadata.title = first 40 chars of user prompt
- Voice sessions: metadata.title = '' + metadata.voiceMode = true
(Flutter renders these as '语音对话 M/D HH:mm')
titleSet flag prevents overwriting the title on subsequent turns of the same session.
2. session.controller.ts — listSessions() now returns a DTO instead of the raw entity.
systemPrompt is an internal engine instruction and is explicitly excluded from the
response. The client receives { id, status, engineType, metadata, createdAt, updatedAt }.