feat(agent): voice-triggered DingTalk binding + GET instances by user
- Add GET /api/v1/agent/instances/user/:userId endpoint so Claude can look up the caller's agent instances without knowing the ID upfront - Update SystemPromptBuilder DingTalk section with centralized binding flow (one-time code via iAgent DingTalk bot, no per-instance creds) - VoiceSessionController.startVoiceSession now extracts userId from JWT and builds a full iAgent system prompt (userId + DingTalk instructions) so Claude knows who is speaking and how to call the binding API - VoiceSessionManager.executeTurn now uses the session's stored system prompt (base context + voice rules) and allows the Bash tool so Claude can call internal APIs via wget during voice conversations User flow: speak "帮我绑定钉钉" → Claude lists instances → generates code via POST /api/v1/agent/channels/dingtalk/bind/:id → speaks code letter-by-letter → user sends code in DingTalk → binding completes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
db0e1f1439
commit
2d0bdbd27f
|
|
@ -305,13 +305,21 @@ export class VoiceSessionManager {
|
||||||
this.logger.log(`[VoiceSession ${sessionId}] Resuming SDK session: ${resumeSessionId}`);
|
this.logger.log(`[VoiceSession ${sessionId}] Resuming SDK session: ${resumeSessionId}`);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Voice-mode system prompt: concise oral Chinese, no markdown / tool details
|
// Voice-mode rules: always appended to whatever base system prompt the session has.
|
||||||
const voiceSystemPrompt =
|
// These keep responses concise and audio-friendly.
|
||||||
'你正在通过语音与用户实时对话。请严格遵守以下规则:\n' +
|
const voiceRules =
|
||||||
|
'\n\n## 语音对话规则(必须严格遵守)\n' +
|
||||||
'1. 只输出用户关注的最终答案,不要输出工具调用过程、中间步骤或技术细节\n' +
|
'1. 只输出用户关注的最终答案,不要输出工具调用过程、中间步骤或技术细节\n' +
|
||||||
'2. 用简洁自然的口语中文回答,像面对面对话一样\n' +
|
'2. 用简洁自然的口语中文回答,像面对面对话一样\n' +
|
||||||
'3. 回复要简短精炼,适合语音播报,通常1-3句话即可\n' +
|
'3. 回复要简短精炼,适合语音播报,通常1-3句话即可\n' +
|
||||||
'4. 不要使用markdown格式、代码块、列表符号等文本格式';
|
'4. 不要使用markdown格式、代码块、列表符号等文本格式\n' +
|
||||||
|
'5. 朗读验证码时,每个字符之间要有明显停顿,例如 "A - 3 - F - 9 - C - 2"';
|
||||||
|
|
||||||
|
// Combine the session's full iAgent system prompt (with userId, DingTalk instructions,
|
||||||
|
// etc.) with the voice-specific rules. Fall back to voice rules only if no stored prompt.
|
||||||
|
const voiceSystemPrompt = session.systemPrompt
|
||||||
|
? `${session.systemPrompt}${voiceRules}`
|
||||||
|
: `你是iAgent服务器运维助手。${voiceRules}`;
|
||||||
|
|
||||||
// Events to suppress in voice mode (only text/completed/error reach TTS)
|
// Events to suppress in voice mode (only text/completed/error reach TTS)
|
||||||
const voiceFilteredTypes = new Set(['thinking', 'tool_use', 'tool_result']);
|
const voiceFilteredTypes = new Set(['thinking', 'tool_use', 'tool_result']);
|
||||||
|
|
@ -324,7 +332,8 @@ export class VoiceSessionManager {
|
||||||
sessionId,
|
sessionId,
|
||||||
prompt: message,
|
prompt: message,
|
||||||
systemPrompt: voiceSystemPrompt,
|
systemPrompt: voiceSystemPrompt,
|
||||||
allowedTools: [],
|
// Allow Bash so Claude can call internal APIs (e.g., DingTalk binding via wget)
|
||||||
|
allowedTools: ['Bash'],
|
||||||
maxTurns: 10,
|
maxTurns: 10,
|
||||||
conversationHistory: historyForEngine.length > 0 ? historyForEngine : undefined,
|
conversationHistory: historyForEngine.length > 0 ? historyForEngine : undefined,
|
||||||
resumeSessionId,
|
resumeSessionId,
|
||||||
|
|
|
||||||
|
|
@ -45,15 +45,25 @@ export class SystemPromptBuilder {
|
||||||
'1. Ask for a name if not given\n' +
|
'1. Ask for a name if not given\n' +
|
||||||
'2. Use the Current User ID from this prompt as userId\n' +
|
'2. Use the Current User ID from this prompt as userId\n' +
|
||||||
'3. Call the create API with Bash and report the result (id, status, containerName)\n\n' +
|
'3. Call the create API with Bash and report the result (id, status, containerName)\n\n' +
|
||||||
'## DingTalk Channel Binding (钉钉接入)\n' +
|
'## DingTalk Channel Binding (钉钉接入 — 语音/对话绑定)\n' +
|
||||||
'To bind a DingTalk bot to an OpenClaw instance, the user needs a DingTalk app with Stream mode enabled:\n' +
|
'iAgent has a centralized DingTalk bot. Users bind their OpenClaw instance to it using a one-time code.\n' +
|
||||||
' Step 1: Go to https://open.dingtalk.com/developer → "创建应用" → Enterprise Internal App\n' +
|
'When a user asks to bind DingTalk (e.g. "帮我绑定钉钉", "bind DingTalk", "连接钉钉"), follow these steps:\n\n' +
|
||||||
' Step 2: Enable "机器人" → set message receive mode to "Stream模式"\n' +
|
' Step 1 — Find the user\'s active instance:\n' +
|
||||||
' Step 3: Copy the AppKey (ClientId) and AppSecret (ClientSecret)\n' +
|
' wget -q -O- http://localhost:3002/api/v1/agent/instances/user/<userId>\n' +
|
||||||
' Step 4: Create/re-deploy the OpenClaw instance with DingTalk creds:\n' +
|
' Pick the first instance with status "running". If none running, pick the first one.\n' +
|
||||||
' wget -q -O- --post-data=\'{"name":"<name>","userId":"<userId>","usePool":true,"dingTalkClientId":"<appKey>","dingTalkClientSecret":"<appSecret>"}\' \\\n' +
|
' If the user has no instances, tell them to create one first.\n\n' +
|
||||||
' --header="Content-Type: application/json" http://localhost:3002/api/v1/agent/instances\n' +
|
' Step 2 — Generate a binding code for the instance:\n' +
|
||||||
' Step 5: Tell the user to add the bot to a DingTalk group or send it a direct message — it will respond via OpenClaw AI.',
|
' wget -q -O- --post-data="" http://localhost:3002/api/v1/agent/channels/dingtalk/bind/<instanceId>\n' +
|
||||||
|
' This returns JSON: { "code": "A3F9C2", "expiresAt": "..." }\n\n' +
|
||||||
|
' Step 3 — Tell the user clearly (voice-friendly):\n' +
|
||||||
|
' Speak the code letter-by-letter with pauses: "验证码是 A-3-F-9-C-2"\n' +
|
||||||
|
' Instructions: "请在钉钉中找到 iAgent 机器人,向它发送这6位验证码。发送后绑定会自动完成,有效期15分钟。"\n' +
|
||||||
|
' If speaking via voice: spell each character slowly for the user to type easily.\n\n' +
|
||||||
|
' Check binding status (optional, if user asks):\n' +
|
||||||
|
' wget -q -O- http://localhost:3002/api/v1/agent/channels/dingtalk/status/<instanceId>\n' +
|
||||||
|
' Returns { "bound": true/false }\n\n' +
|
||||||
|
' Unbind DingTalk:\n' +
|
||||||
|
' wget -q -O- --post-data="" http://localhost:3002/api/v1/agent/channels/dingtalk/unbind/<instanceId>',
|
||||||
);
|
);
|
||||||
|
|
||||||
// Tenant + user context
|
// Tenant + user context
|
||||||
|
|
|
||||||
|
|
@ -32,6 +32,12 @@ export class AgentInstanceController {
|
||||||
return this.instanceRepo.findAll();
|
return this.instanceRepo.findAll();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@Get('user/:userId')
|
||||||
|
async listByUser(@Param('userId') userId: string) {
|
||||||
|
const instances = await this.instanceRepo.findByUserId(userId);
|
||||||
|
return instances.map((inst) => this.sanitize(inst));
|
||||||
|
}
|
||||||
|
|
||||||
@Get(':id')
|
@Get(':id')
|
||||||
async getOne(@Param('id') id: string) {
|
async getOne(@Param('id') id: string) {
|
||||||
const inst = await this.instanceRepo.findById(id);
|
const inst = await this.instanceRepo.findById(id);
|
||||||
|
|
|
||||||
|
|
@ -12,12 +12,14 @@
|
||||||
* the user speaks, and calls voice/terminate on room disconnect.
|
* the user speaks, and calls voice/terminate on room disconnect.
|
||||||
*/
|
*/
|
||||||
import {
|
import {
|
||||||
Controller, Post, Delete, Param, Body,
|
Controller, Post, Delete, Param, Body, Req,
|
||||||
NotFoundException, BadRequestException, Logger,
|
NotFoundException, BadRequestException, Logger,
|
||||||
} from '@nestjs/common';
|
} from '@nestjs/common';
|
||||||
|
import { Request } from 'express';
|
||||||
import { TenantId } from '@it0/common';
|
import { TenantId } from '@it0/common';
|
||||||
import { VoiceSessionManager } from '../../../domain/services/voice-session-manager.service';
|
import { VoiceSessionManager } from '../../../domain/services/voice-session-manager.service';
|
||||||
import { SessionRepository } from '../../../infrastructure/repositories/session.repository';
|
import { SessionRepository } from '../../../infrastructure/repositories/session.repository';
|
||||||
|
import { SystemPromptBuilder } from '../../../infrastructure/engines/claude-code-cli/system-prompt-builder';
|
||||||
import { AgentEngineType } from '../../../domain/value-objects/agent-engine-type.vo';
|
import { AgentEngineType } from '../../../domain/value-objects/agent-engine-type.vo';
|
||||||
import { AgentSession } from '../../../domain/entities/agent-session.entity';
|
import { AgentSession } from '../../../domain/entities/agent-session.entity';
|
||||||
import * as crypto from 'crypto';
|
import * as crypto from 'crypto';
|
||||||
|
|
@ -29,6 +31,7 @@ export class VoiceSessionController {
|
||||||
constructor(
|
constructor(
|
||||||
private readonly voiceSessionManager: VoiceSessionManager,
|
private readonly voiceSessionManager: VoiceSessionManager,
|
||||||
private readonly sessionRepository: SessionRepository,
|
private readonly sessionRepository: SessionRepository,
|
||||||
|
private readonly systemPromptBuilder: SystemPromptBuilder,
|
||||||
) {}
|
) {}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
|
@ -41,6 +44,7 @@ export class VoiceSessionController {
|
||||||
@Post('voice/start')
|
@Post('voice/start')
|
||||||
async startVoiceSession(
|
async startVoiceSession(
|
||||||
@TenantId() tenantId: string,
|
@TenantId() tenantId: string,
|
||||||
|
@Req() req: Request,
|
||||||
@Body() body: { sessionId?: string; systemPrompt?: string },
|
@Body() body: { sessionId?: string; systemPrompt?: string },
|
||||||
) {
|
) {
|
||||||
let session: AgentSession | null = null;
|
let session: AgentSession | null = null;
|
||||||
|
|
@ -53,6 +57,19 @@ export class VoiceSessionController {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Extract user identity from JWT for context-aware system prompt
|
||||||
|
const jwtUser = (req as any).user;
|
||||||
|
const userId: string | undefined = jwtUser?.sub ?? jwtUser?.userId;
|
||||||
|
const userEmail: string | undefined = jwtUser?.email;
|
||||||
|
|
||||||
|
// Build the full iAgent system prompt (includes DingTalk binding instructions, userId, etc.)
|
||||||
|
// If the caller explicitly overrides with their own systemPrompt, use that instead.
|
||||||
|
const builtSystemPrompt = body.systemPrompt || this.systemPromptBuilder.build({
|
||||||
|
tenantId,
|
||||||
|
userId,
|
||||||
|
userEmail,
|
||||||
|
});
|
||||||
|
|
||||||
if (!session) {
|
if (!session) {
|
||||||
// Create a fresh session pre-marked as voice mode
|
// Create a fresh session pre-marked as voice mode
|
||||||
session = new AgentSession();
|
session = new AgentSession();
|
||||||
|
|
@ -60,7 +77,7 @@ export class VoiceSessionController {
|
||||||
session.tenantId = tenantId;
|
session.tenantId = tenantId;
|
||||||
session.engineType = AgentEngineType.CLAUDE_AGENT_SDK;
|
session.engineType = AgentEngineType.CLAUDE_AGENT_SDK;
|
||||||
session.status = 'active';
|
session.status = 'active';
|
||||||
session.systemPrompt = body.systemPrompt;
|
session.systemPrompt = builtSystemPrompt;
|
||||||
session.metadata = { voiceMode: true, title: '', titleSet: true };
|
session.metadata = { voiceMode: true, title: '', titleSet: true };
|
||||||
session.createdAt = new Date();
|
session.createdAt = new Date();
|
||||||
session.updatedAt = new Date();
|
session.updatedAt = new Date();
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue