Commit Graph

19 Commits

Author SHA1 Message Date
hailin 8865985019 feat(agent-instance-chat): 实现用户与自己的 OpenClaw 智能体直接对话功能
## 功能概述
用户可在「我的智能体」页面点击运行中的 OpenClaw 实例卡片,
直接打开与该智能体的专属对话页面,完整复用 iAgent 的聊天 UI
(流式输出、工具时间线、审批卡片、语音输入等),同时保证
iAgent 对话完全不受影响。

## 架构设计
- 使用 Riverpod ProviderScope 子作用域覆盖 chatRemoteDatasourceProvider
  / chatProvider / sessionListProvider,实现 iAgent 与实例对话的
  provider 完全隔离,无任何共享状态。
- OpenClaw bridge 采用已有的 /task-async 异步回调模式:
    Flutter → POST /api/v1/agent/instances/:id/tasks(立即返回 sessionId/taskId)
    → 订阅 WS /ws/agent(等待事件)
    → Bridge 完成后 POST /api/v1/agent/instances/openclaw-app-callback(公开端点)
    → 后端发 WS text+completed 事件 → Flutter 收到回复
- 每个实例的会话通过 agent_sessions.agent_instance_id 字段隔离,
  会话抽屉只显示当前实例的历史记录。

## 后端变更
### packages/shared/database/src/migrations/013-add-agent-instance-id-to-sessions.sql
- 新增迁移:ALTER TABLE agent_sessions ADD COLUMN agent_instance_id UUID NULL
- 为按实例过滤会话建立索引

### packages/services/agent-service/src/domain/entities/agent-session.entity.ts
- 新增可选字段 agentInstanceId: string(对应 agent_instance_id 列)
- iAgent 会话该字段为 null;实例聊天会话存储对应的 instance UUID

### packages/services/agent-service/src/infrastructure/repositories/session.repository.ts
- 新增 findByInstanceId(tenantId, agentInstanceId) 方法
- 用于 GET /instances/:id/sessions 按实例过滤会话列表

### packages/services/agent-service/src/interfaces/rest/controllers/agent.controller.ts
新增三个端点(注意:已知存在以下待修复问题,见后续 fix commit):
1. POST /api/v1/agent/instances/:instanceId/tasks
   - 校验 instance 归属(userId 匹配)和 running 状态
   - 创建会话(engineType='openclaw',携带 agentInstanceId)
   - 保存用户消息到 conversation_messages 表
   - 向 OpenClaw bridge POST /task-async,sessionKey=it0:{sessionId}
   - 立即返回 { sessionId, taskId },Flutter 订阅 WS 等待回调
2. GET /api/v1/agent/instances/:instanceId/sessions
   - 返回该实例的会话列表(含 title/status/时间戳)
3. POST /api/v1/agent/instances/openclaw-app-callback(公开端点,无 JWT)
   - bridge 完成后回调此端点
   - 成功:发 WS text+completed 事件,保存 assistant 消息,更新 task 状态
   - 失败/超时:发 WS error 事件,标记 task 为 FAILED
- 注入 AgentInstanceRepository 依赖
- 新增私有方法 createInstanceSession()

### packages/gateway/config/kong.yml
- 新增 openclaw-app-callback-public service(无 JWT 插件)
- 路由:POST /api/v1/agent/instances/openclaw-app-callback
- 必须在 agent-service 之前声明,确保路由优先匹配(同 wecom-public 模式)

## Flutter 变更
### it0_app/lib/core/config/api_endpoints.dart
- 新增 instanceTasks(instanceId) 和 instanceSessions(instanceId) 静态方法

### it0_app/lib/features/chat/presentation/pages/chat_page.dart
- 新增可选参数 agentName(默认 null = iAgent 模式)
- agentName != null 时:AppBar 显示智能体名称,隐藏语音通话按钮
- 不传 agentName 时行为与原来完全一致,iAgent 功能零影响

### it0_app/lib/features/my_agents/presentation/pages/my_agents_page.dart
- _InstanceCard 新增 onTap 回调参数
- 卡片用 Material+InkWell 包裹,支持圆角水波纹点击效果
- 新增 _openInstanceChat() 顶层函数:
    running → 滑入式跳转到 AgentInstanceChatPage
    其他状态 → SnackBar 提示(部署中/已停止/错误)
- 导入 AgentInstanceChatPage

### it0_app/lib/features/agent_instance_chat/(新建功能模块)
data/datasources/agent_instance_chat_remote_datasource.dart:
- AgentInstanceChatDatasource implements ChatRemoteDatasource
- 通过组合模式包装 ChatRemoteDatasource 委托所有通用操作
- 覆盖 createTask → POST /api/v1/agent/instances/:id/tasks
- 覆盖 listSessions → GET /api/v1/agent/instances/:id/sessions(仅当前实例会话)

presentation/pages/agent_instance_chat_page.dart:
- AgentInstanceChatPage(instance: AgentInstance)
- ProviderScope 子作用域覆盖三个 provider 实现完全隔离:
    chatRemoteDatasourceProvider → AgentInstanceChatDatasource
    chatProvider → 独立 ChatNotifier 实例(与 iAgent 零共享)
    sessionListProvider → 仅当前实例的会话列表
- child: ChatPage(agentName: instance.name) 完整复用 UI

## 已知待修复问题(下一个 commit)
1. [安全] 鉴权检查逻辑:if (userId && ...) 应为 if (!userId || ...)
2. [可靠性] fetch 未处理 HTTP 4xx/5xx 错误,任务可能永久挂起
3. [可靠性] bridge 回调无超时机制,bridge 崩溃后任务永久 RUNNING
4. [UX] robotStateProvider 未在子 ProviderScope 覆盖,头像动画反映 iAgent 状态
5. [UX] 实例聊天附件 UI 未禁用,上传附件被静默丢弃
6. [UX] 语音消息在实例模式下错误路由到 iAgent 引擎(非 OpenClaw)
7. [DB] 002 模板未加 agent_instance_id 列,新租户缺失此字段

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 19:30:38 -07:00
hailin f186c57afb fix(agent): decode JWT directly to get userId for system prompt
req.user is never populated in agent-service (Kong verifies JWT, no Passport strategy).
This caused userId to always be undefined → system prompt had no 'Current User ID' →
Claude used tenant slug 'shenzhengj' as userId → DB error 'invalid input syntax for
type uuid'.

Fix: decode JWT payload from Authorization header (no signature verify needed — Kong
already verified it) to extract sub (user UUID) for both AgentController and
VoiceSessionController.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 08:51:28 -07:00
hailin 495407d25b fix(agent): pass sessionId to system prompt for text chat OAuth trigger
Text sessions were not passing sessionId to SystemPromptBuilder, causing
Claude to use the `initiate_dingtalk_binding` custom tool (claude_api only).
When the engine is claude_agent_sdk, this tool does not exist → 404.

Fix: pass session.id as sessionId to systemPromptBuilder.build() in
agent.controller.ts. Claude will now use the wget oauth-trigger endpoint
for ALL session types (text and voice), which works with every engine.

Also: store userId (staffId) as the DingTalk binding ID when resolvable,
falling back to openId. Bot messages deliver senderStaffId which matches
userId, not openId — this prevents the "binding not found" routing failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 14:20:58 -07:00
hailin b87cebf465 feat(agent): inject userId into system prompt + fix agent-instance nullable columns
- SystemPromptBuilder: add userId/userEmail to context, expose internal API curl commands for OpenClaw creation
- agent.controller.ts: extract userId from JWT, build system prompt via SystemPromptBuilder so iAgent knows current user
- agent.module.ts: register SystemPromptBuilder as provider
- agent-instance.entity.ts: make serverHost/sshUser nullable (pool mode doesn't set these upfront)
- DB: ALTER TABLE agent_instances DROP NOT NULL on server_host/ssh_user

Now iAgent can create 小龙虾 instances autonomously when user asks in natural language.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 03:05:15 -07:00
hailin 4c7c05eb37 feat(stt): support auto language detection for mixed Chinese-English input
- Flutter: language='auto' omits the language field → backend receives none
- Backend: no language field → passes undefined to STT service
- STT service: language=undefined → omits language param from Whisper request
- Whisper auto-detects language per utterance when no hint is provided

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 08:13:26 -08:00
hailin 2182149c4c feat(chat): voice-to-text fills input box instead of auto-sending
- Add POST /api/v1/agent/transcribe endpoint (STT only, no agent trigger)
- Add transcribeAudio() to chat datasource and provider
- VoiceMicButton now fills the text input field with transcript;
  user reviews and sends manually
- Add OPENAI_API_KEY/OPENAI_BASE_URL to agent-service in docker-compose

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 07:01:39 -08:00
hailin a2af76bcd7 feat(agent-service): add voice message endpoint with Whisper STT and async interrupt
New endpoint: POST /api/v1/agent/sessions/:sessionId/voice-message
- Accepts multipart/form-data audio file (any format Whisper supports)
- Transcribes via OpenAI Whisper API (routed through existing proxy)
- If a task is currently running in the session → hard-interrupts it first
  (same cancel+inject pattern as text inject, triggered by voice command)
- Otherwise → starts a fresh task with the transcript
- Returns { sessionId, taskId, transcript } so client can subscribe to WS stream

This enables WhatsApp-style push-to-talk and doubles as an async voice
interrupt into any active agent workflow, bypassing the need for speaker
diarization (whoever presses record owns the message).

New files:
  infrastructure/stt/openai-stt.service.ts — OpenAI Whisper client,
  manually builds multipart/form-data, supports self-signed proxy cert

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 03:12:03 -08:00
hailin 6ca8aab243 fix(agent-service): store proper title in session metadata, exclude systemPrompt from list API
Two issues fixed:

1. agent.controller.ts — on the FIRST task of each session, write title+voiceMode
   into session.metadata so the client can display a meaningful conversation title:
     - Text sessions: metadata.title = first 40 chars of user prompt
     - Voice sessions: metadata.title = '' + metadata.voiceMode = true
       (Flutter renders these as '语音对话 M/D HH:mm')
   titleSet flag prevents overwriting the title on subsequent turns of the same session.

2. session.controller.ts — listSessions() now returns a DTO instead of the raw entity.
   systemPrompt is an internal engine instruction and is explicitly excluded from the
   response. The client receives { id, status, engineType, metadata, createdAt, updatedAt }.
2026-03-04 02:39:47 -08:00
hailin 9ed80cd0bc feat: implement complete commercial monetization loop (Phases 1-4)
## Phase 1 - Token Metering + Quota Enforcement

### Usage Tracking
- agent-service: add UsageRecord entity (per-tenant schema) tracking
  inputTokens/outputTokens/costUsd per AI task
- Modify all 3 AI engines (claude-api, claude-code-cli, claude-agent-sdk)
  to emit separate input/output token counts in the `completed` event
- claude-api-engine: costUsd = (input*3 + output*15) / 1,000,000
  (claude-sonnet-4-5 pricing: $3/MTok in, $15/MTok out)
- agent.controller: persist UsageRecord and publish `usage.recorded`
  event to Redis Streams on every task completion (non-blocking)
- shared/events: new events UsageRecordedEvent, SubscriptionChangedEvent,
  QuotaExceededEvent, PaymentReceivedEvent

### Quota Enforcement
- TenantInfo: add maxServers, maxUsers, maxStandingOrders,
  maxAgentTokensPerMonth fields
- TenantContextMiddleware: rewritten to query public.tenants table for
  real quota values; 5-min in-memory cache; plan-based fallback on error
- TenantContextService: getTenant() returns null instead of throwing;
  added getTenantOrThrow() for strict callers
- inventory-service/server.controller: 429 when maxServers exceeded
- ops-service/standing-order.controller: 429 when maxStandingOrders exceeded
- auth-service/auth.service: 429 when maxUsers exceeded
- 002-create-tenant-schema-template.sql: add usage_records table

## Phase 2 - billing-service (New Microservice, port 3010)

### Domain Layer (public schema, all UUIDs)
Entities: Plan, Subscription, Invoice, InvoiceItem, Payment, PaymentMethod,
UsageAggregate

Domain services:
- SubscriptionLifecycleService: full state machine (trialing -> active ->
  past_due -> cancelled/expired); upgrades immediate, downgrades at period end
- InvoiceGeneratorService: monthly invoice = base fee + overage charges;
  proration item for mid-cycle upgrades
- OverageCalculatorService: (totalTokens - includedTokens) * overageRate

### Infrastructure (all repos use DataSource directly, NOT TenantAwareRepository)
- PlanRepository, SubscriptionRepository, InvoiceRepository (atomic
  transaction for invoice+items), PaymentRepository (payments + methods),
  UsageAggregateRepository (UPSERT via ON CONFLICT for atomic accumulation)

### Application Use Cases
- CreateSubscriptionUseCase: called on tenant registration
- ChangePlanUseCase: upgrade (immediate + proration) or downgrade (scheduled)
- CancelSubscriptionUseCase: immediate or at-period-end
- GenerateMonthlyInvoiceUseCase: cron target (1st of month 00:05 UTC);
  generates invoices, renews periods, applies scheduled downgrades
- AggregateUsageUseCase: Redis Streams consumer group billing-service,
  upserts monthly usage aggregates from usage.recorded events
- CheckTokenQuotaUseCase: hard limit enforcement per plan
- CreatePaymentSessionUseCase + HandlePaymentWebhookUseCase

### REST API
- GET  /api/v1/billing/plans
- GET/POST /api/v1/billing/subscription (+ /upgrade, /cancel)
- GET  /api/v1/billing/invoices (paginated)
- GET  /api/v1/billing/invoices/:id
- POST /api/v1/billing/invoices/:id/pay
- GET  /api/v1/billing/usage/current + /history
- CRUD /api/v1/billing/payment-methods
- POST /api/v1/billing/webhooks/{stripe,alipay,wechat,crypto}

### Plan Seed (auto on startup via PlanSeedService)
- free:       $0/mo,    100K tokens,  no overage,  hard limit 100%
- pro:        $49.99/mo, 1M tokens,  $8/MTok,  hard limit 150%
- enterprise: $199.99/mo, 10M tokens, $5/MTok, no hard limit

## Phase 3 - Payment Provider Integration

### PaymentProviderRegistry (Strategy Pattern, mirrors EngineRegistry)
All providers use @Optional() injection; unconfigured providers omitted

- StripeProvider: PaymentIntent API; webhook via stripe.webhooks.constructEvent
- AlipayProvider: alipay-sdk; Native QR (precreate); RSA2 signature verify
- WeChatPayProvider: v3 REST; Native Pay code_url; AES-256-GCM decrypt;
  HMAC-SHA256 request signing and webhook verification
- CryptoProvider: Coinbase Commerce; hosted checkout; HMAC-SHA256 verify

### WebhookController
All 4 webhook endpoints are public (no JWT) for payment provider callbacks.
rawBody: true enabled in main.ts for signature verification.

## Infrastructure Changes
- docker-compose.yml: billing-service container (port 13010);
  added as dependency of api-gateway
- kong.yml: /api/v1/billing routes (JWT); /api/v1/billing/webhooks (public)
- 005-create-billing-tables.sql: 7 billing tables + invoice sequence +
  ALTER tenants to add quota columns
- run-migrations.ts: 005 runs as part of shared schema step

## Phase 4 - Frontend

### Web Admin (Next.js)
New pages:
- /billing: subscription card + token usage bar + warning banner + invoices
- /billing/plans: comparison grid with USD/CNY toggle + upgrade/downgrade flow
- /billing/invoices: paginated table with Pay Now button
Sidebar: Billing group (CreditCard icon, 3 sub-items)
i18n: billing keys added to en + zh sidebar translations

### Flutter App
New feature module it0_app/lib/features/billing/:
- BillingOverviewPage: plan card + token LinearProgressIndicator +
  latest invoice + upgrade button
- BillingProvider (FutureProvider): parallel fetch subscription/quota/invoice
Settings page: "订阅与用量" entry card
Router: /settings/billing sub-route

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 21:09:17 -08:00
hailin da17488389 feat: voice mode event filtering — skip tool/thinking events for Agent SDK
1. Remove on_enter greeting entirely (no more race condition)
2. voice-agent sends voiceMode: true when engine_type is claude_agent_sdk
3. AgentController.runTaskStream() filters thinking, tool_use, tool_result
   events in voice mode — only text, completed, error reach the client
4. Detailed logging: each event logged with [FILTERED-voice] tag when skipped

Claude API mode is completely unaffected (voiceMode defaults to false).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 02:56:41 -08:00
hailin e4c2505048 feat: add multimodal image input with streaming markdown optimization
Two major features in this commit:

1. Streaming Markdown Rendering Optimization
   - Replace deprecated flutter_markdown with gpt_markdown (active, AI-optimized)
   - Real-time markdown rendering during streaming (was showing raw syntax)
   - Solid block cursor (█) instead of AnimationController blink
   - 80ms token throttle buffer reducing rebuilds from per-token to ~12.5/sec
   - RepaintBoundary isolation for markdown widget repaints
   - StreamTextWidget simplified from StatefulWidget to StatelessWidget

2. Multimodal Image Input (camera + gallery + display)
   - Flutter: image_picker for gallery/camera, base64 encoding, attachment
     preview strip with delete, thumbnails in sent messages
   - Data layer: List<String>? → List<Map<String, dynamic>>? for structured
     attachment payloads through datasource/repository/usecase
   - ChatAttachment model with base64Data, mediaType, fileName
   - ChatMessage entity + ChatMessageModel both support attachments field
   - Backend DTO, Entity (JSONB), Controller, ConversationContextService
     all extended to receive, store, and reconstruct Anthropic image
     content blocks in loadContext()
   - Claude API engine skips duplicate user message when history already
     ends with multimodal content blocks
   - NestJS body parser limit raised to 10MB for base64 image payloads
   - Android CAMERA permission added to manifest
   - Image.memory uses cacheWidth/cacheHeight for memory efficiency
   - Max 5 images per message enforced in UI

Data flow:
  ImagePicker → base64Encode → ChatAttachment → POST body →
  DB (JSONB) → loadContext → Anthropic image content blocks → Claude API

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 03:24:17 -08:00
hailin 50dbb641a3 fix: comprehensive hardening of agent task cancel/inject/approve flows
6 rounds of systematic audit identified and fixed 14 bugs across
backend controller and Flutter client:

## Backend (agent.controller.ts)

Security & Tenant Isolation:
- Add @TenantId + ForbiddenException check to cancelTask, injectMessage,
  approveCommand — all 4 write endpoints now enforce tenant isolation
- Add tenantId check on session reuse in executeTask to prevent
  cross-tenant session hijacking

Architecture & Correctness:
- Extract shared runTaskStream() from inline fire-and-forget block,
  used by both executeTask and injectMessage to reduce duplication
- Use session.engineType (not getActiveEngine()) in cancelTask,
  injectMessage, approveCommand — fixes wrong-engine-cancel when
  global engine config is switched after task creation
- Add concurrent task prevention: executeTask checks for existing
  RUNNING task on same session and cancels it before starting new one
- Add runningTasks Map to track task promises, awaitTaskCleanup()
  helper with 3s timeout for inject to wait for partial text save
- captureSdkSessionId() captures SDK session ID into metadata
  without DB save (callers persist), preventing fire-and-forget race

Cancel/Reject Improvements:
- cancelTask: idempotent (returns early if already CANCELLED/COMPLETED),
  session stays 'active' (was 'cancelled'), emits cancelled WS event
- approveCommand reject: session stays 'active' (was 'cancelled'),
  now emits cancelled WS event so Flutter stream listeners clean up
- approveCommand approved: collect text events and save assistant
  response to conversation history on completion (was missing)

Minor:
- task.result! non-null assertion → task.result ?? 'Unknown error'
- Add findRunningBySessionId() to TaskRepository

## Flutter

API Contract Fix:
- approveCommand: route changed from /api/v1/ops/approvals/:id/approve
  to /api/v1/agent/tasks/:id/approve with {approved: true} body
- rejectCommand: route changed from /api/v1/ops/approvals/:id/reject
  to /api/v1/agent/tasks/:id/approve with {approved: false} body

Resource Management:
- ChatNotifier.dispose() now disconnects WebSocket to prevent
  connection leak when navigating away from chat

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 22:20:46 -08:00
hailin cc0f06e2be feat: SDK engine native resume with per-tenant HOME isolation
Replace prompt-prefix workaround with SDK's native resume mechanism.
Each tenant gets isolated HOME directory (/data/claude-tenants/{tenantId})
to prevent cross-tenant session file mixing. SDK session IDs are persisted
in session.metadata for cross-request resume support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 02:27:38 -08:00
hailin 2403ce5636 feat: multi-turn conversation context management with session history UI
Implement DB-based conversation message storage (engine-agnostic) that
works across both Claude API and Agent SDK engines. Add ChatGPT/Claude-style
conversation history drawer in Flutter with date-grouped session list,
session switching, and new chat functionality.

Backend: entity, repository, context service, migration 004, session/message
API endpoints. Flutter: ConversationDrawer, sessionId flow from backend
response via SessionInfoEvent, session list/switch/delete support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 19:04:35 -08:00
hailin 5d4fd96d43 feat: streaming claude-api engine, engineType override, fix voice test page
- Claude API engine now uses streaming API (messages.stream) for real-time
  text delta output instead of waiting for full response
- Agent controller accepts optional engineType body parameter to allow
  callers (e.g. voice pipeline) to select a specific engine
- Fix voice_test_page.dart compilation error: replace audioplayers (not
  installed) with flutter_sound (already in pubspec.yaml)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 05:30:11 -08:00
hailin 2a150dcff5 fix: prevent error event from overriding completed status in controller
Add finished guard so that once a task reaches completed/error terminal
state, subsequent events don't flip the status back.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 03:49:21 -08:00
hailin a7b42e6b98 feat: add detailed logging to agent engine and task controller
Log every SDK message type, event emission, and stream lifecycle
to diagnose why text events are missing in voice-agent flow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 02:56:09 -08:00
hailin 806113554b fix: remove AuthGuard('jwt') from all service controllers
Kong handles JWT validation at the gateway level. Service-level
AuthGuard('jwt') fails because services don't register a Passport
JWT strategy (only auth-service does). Removed from 17 controllers
across ops, inventory, monitor, comm, audit, and agent services.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 23:42:37 -08:00
hailin 00f8801d51 Initial commit: IT0 AI-powered server cluster operations platform
Full-stack monorepo with DDD + Clean Architecture:
- Backend: 7 NestJS microservices + 5 shared libraries (TypeScript)
- Mobile: Flutter app with Riverpod (Dart)
- Web Admin: Next.js dashboard with Zustand + React Query
- Voice: Python voice service (STT/TTS/VAD)
- Infra: Docker Compose, K8s manifests, Turborepo build

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 22:54:37 -08:00