Commit Graph

24 Commits

Author SHA1 Message Date
hailin 2182149c4c feat(chat): voice-to-text fills input box instead of auto-sending
- Add POST /api/v1/agent/transcribe endpoint (STT only, no agent trigger)
- Add transcribeAudio() to chat datasource and provider
- VoiceMicButton now fills the text input field with transcript;
  user reviews and sends manually
- Add OPENAI_API_KEY/OPENAI_BASE_URL to agent-service in docker-compose

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 07:01:39 -08:00
hailin a2af76bcd7 feat(agent-service): add voice message endpoint with Whisper STT and async interrupt
New endpoint: POST /api/v1/agent/sessions/:sessionId/voice-message
- Accepts multipart/form-data audio file (any format Whisper supports)
- Transcribes via OpenAI Whisper API (routed through existing proxy)
- If a task is currently running in the session → hard-interrupts it first
  (same cancel+inject pattern as text inject, triggered by voice command)
- Otherwise → starts a fresh task with the transcript
- Returns { sessionId, taskId, transcript } so client can subscribe to WS stream

This enables WhatsApp-style push-to-talk and doubles as an async voice
interrupt into any active agent workflow, bypassing the need for speaker
diarization (whoever presses record owns the message).

New files:
  infrastructure/stt/openai-stt.service.ts — OpenAI Whisper client,
  manually builds multipart/form-data, supports self-signed proxy cert

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 03:12:03 -08:00
hailin 635cca18fa feat(voice): long-lived agent session with proper hangup termination
Replace the per-turn POST /tasks approach for voice calls with a
long-lived agent run loop tied to the call lifecycle:

agent-service:
- Add AsyncQueue<T> utility for blocking message relay
- Add VoiceSessionManager: spawns one background run loop per voice call,
  accepts injected messages, terminates cleanly on hangup
- Add VoiceSessionController with 3 endpoints:
    POST   /api/v1/agent/sessions/voice/start  (call start)
    POST   /api/v1/agent/sessions/:id/voice/inject  (each speech turn)
    DELETE /api/v1/agent/sessions/:id/voice    (user hung up)
- Register VoiceSessionManager + VoiceSessionController in agent.module.ts

voice-agent:
- AgentServiceLLM: add start_voice_session(), terminate_voice_session(),
  inject_text_message() (voice/inject-aware), _do_inject_voice()
- AgentServiceLLMStream._run(): use voice/inject path when voice session
  is active; fall back to per-task POST for text-chat / non-SDK engines
- entrypoint(): call start_voice_session() after session.start();
  register _on_room_disconnect that calls terminate_voice_session()
  so the agent is always killed when the user hangs up

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 04:01:02 -08:00
hailin 6ca8aab243 fix(agent-service): store proper title in session metadata, exclude systemPrompt from list API
Two issues fixed:

1. agent.controller.ts — on the FIRST task of each session, write title+voiceMode
   into session.metadata so the client can display a meaningful conversation title:
     - Text sessions: metadata.title = first 40 chars of user prompt
     - Voice sessions: metadata.title = '' + metadata.voiceMode = true
       (Flutter renders these as '语音对话 M/D HH:mm')
   titleSet flag prevents overwriting the title on subsequent turns of the same session.

2. session.controller.ts — listSessions() now returns a DTO instead of the raw entity.
   systemPrompt is an internal engine instruction and is explicitly excluded from the
   response. The client receives { id, status, engineType, metadata, createdAt, updatedAt }.
2026-03-04 02:39:47 -08:00
hailin 9ed80cd0bc feat: implement complete commercial monetization loop (Phases 1-4)
## Phase 1 - Token Metering + Quota Enforcement

### Usage Tracking
- agent-service: add UsageRecord entity (per-tenant schema) tracking
  inputTokens/outputTokens/costUsd per AI task
- Modify all 3 AI engines (claude-api, claude-code-cli, claude-agent-sdk)
  to emit separate input/output token counts in the `completed` event
- claude-api-engine: costUsd = (input*3 + output*15) / 1,000,000
  (claude-sonnet-4-5 pricing: $3/MTok in, $15/MTok out)
- agent.controller: persist UsageRecord and publish `usage.recorded`
  event to Redis Streams on every task completion (non-blocking)
- shared/events: new events UsageRecordedEvent, SubscriptionChangedEvent,
  QuotaExceededEvent, PaymentReceivedEvent

### Quota Enforcement
- TenantInfo: add maxServers, maxUsers, maxStandingOrders,
  maxAgentTokensPerMonth fields
- TenantContextMiddleware: rewritten to query public.tenants table for
  real quota values; 5-min in-memory cache; plan-based fallback on error
- TenantContextService: getTenant() returns null instead of throwing;
  added getTenantOrThrow() for strict callers
- inventory-service/server.controller: 429 when maxServers exceeded
- ops-service/standing-order.controller: 429 when maxStandingOrders exceeded
- auth-service/auth.service: 429 when maxUsers exceeded
- 002-create-tenant-schema-template.sql: add usage_records table

## Phase 2 - billing-service (New Microservice, port 3010)

### Domain Layer (public schema, all UUIDs)
Entities: Plan, Subscription, Invoice, InvoiceItem, Payment, PaymentMethod,
UsageAggregate

Domain services:
- SubscriptionLifecycleService: full state machine (trialing -> active ->
  past_due -> cancelled/expired); upgrades immediate, downgrades at period end
- InvoiceGeneratorService: monthly invoice = base fee + overage charges;
  proration item for mid-cycle upgrades
- OverageCalculatorService: (totalTokens - includedTokens) * overageRate

### Infrastructure (all repos use DataSource directly, NOT TenantAwareRepository)
- PlanRepository, SubscriptionRepository, InvoiceRepository (atomic
  transaction for invoice+items), PaymentRepository (payments + methods),
  UsageAggregateRepository (UPSERT via ON CONFLICT for atomic accumulation)

### Application Use Cases
- CreateSubscriptionUseCase: called on tenant registration
- ChangePlanUseCase: upgrade (immediate + proration) or downgrade (scheduled)
- CancelSubscriptionUseCase: immediate or at-period-end
- GenerateMonthlyInvoiceUseCase: cron target (1st of month 00:05 UTC);
  generates invoices, renews periods, applies scheduled downgrades
- AggregateUsageUseCase: Redis Streams consumer group billing-service,
  upserts monthly usage aggregates from usage.recorded events
- CheckTokenQuotaUseCase: hard limit enforcement per plan
- CreatePaymentSessionUseCase + HandlePaymentWebhookUseCase

### REST API
- GET  /api/v1/billing/plans
- GET/POST /api/v1/billing/subscription (+ /upgrade, /cancel)
- GET  /api/v1/billing/invoices (paginated)
- GET  /api/v1/billing/invoices/:id
- POST /api/v1/billing/invoices/:id/pay
- GET  /api/v1/billing/usage/current + /history
- CRUD /api/v1/billing/payment-methods
- POST /api/v1/billing/webhooks/{stripe,alipay,wechat,crypto}

### Plan Seed (auto on startup via PlanSeedService)
- free:       $0/mo,    100K tokens,  no overage,  hard limit 100%
- pro:        $49.99/mo, 1M tokens,  $8/MTok,  hard limit 150%
- enterprise: $199.99/mo, 10M tokens, $5/MTok, no hard limit

## Phase 3 - Payment Provider Integration

### PaymentProviderRegistry (Strategy Pattern, mirrors EngineRegistry)
All providers use @Optional() injection; unconfigured providers omitted

- StripeProvider: PaymentIntent API; webhook via stripe.webhooks.constructEvent
- AlipayProvider: alipay-sdk; Native QR (precreate); RSA2 signature verify
- WeChatPayProvider: v3 REST; Native Pay code_url; AES-256-GCM decrypt;
  HMAC-SHA256 request signing and webhook verification
- CryptoProvider: Coinbase Commerce; hosted checkout; HMAC-SHA256 verify

### WebhookController
All 4 webhook endpoints are public (no JWT) for payment provider callbacks.
rawBody: true enabled in main.ts for signature verification.

## Infrastructure Changes
- docker-compose.yml: billing-service container (port 13010);
  added as dependency of api-gateway
- kong.yml: /api/v1/billing routes (JWT); /api/v1/billing/webhooks (public)
- 005-create-billing-tables.sql: 7 billing tables + invoice sequence +
  ALTER tenants to add quota columns
- run-migrations.ts: 005 runs as part of shared schema step

## Phase 4 - Frontend

### Web Admin (Next.js)
New pages:
- /billing: subscription card + token usage bar + warning banner + invoices
- /billing/plans: comparison grid with USD/CNY toggle + upgrade/downgrade flow
- /billing/invoices: paginated table with Pay Now button
Sidebar: Billing group (CreditCard icon, 3 sub-items)
i18n: billing keys added to en + zh sidebar translations

### Flutter App
New feature module it0_app/lib/features/billing/:
- BillingOverviewPage: plan card + token LinearProgressIndicator +
  latest invoice + upgrade button
- BillingProvider (FutureProvider): parallel fetch subscription/quota/invoice
Settings page: "订阅与用量" entry card
Router: /settings/billing sub-route

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 21:09:17 -08:00
hailin 7fb0d1de95 refactor: remove Speechmatics STT integration entirely, default to OpenAI
- Delete speechmatics_stt.py plugin
- Remove speechmatics branch from voice-agent entrypoint
- Remove livekit-plugins-speechmatics dependency
- Change default stt_provider to 'openai' in entity, controller, and UI
- Remove SPEECHMATICS_API_KEY from docker-compose.yml
- Remove speechmatics option from web-admin settings dropdown

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 04:58:38 -08:00
hailin e32a3a9800 fix: use @TenantId() decorator in VoiceConfigController for JWT tenant extraction
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 22:30:37 -08:00
hailin f9c47de04b feat: add STT provider switching (OpenAI ↔ Speechmatics) in settings
- Add VoiceConfig entity/repo/service/controller in agent-service
  for per-tenant STT provider persistence (default: speechmatics)
- Add Speechmatics STT plugin in voice-agent with livekit-plugins-speechmatics
- Modify voice-agent entrypoint for 3-way STT selection:
  metadata > agent-service config > env var fallback
- Add "Voice" section in web-admin settings page with STT provider dropdown
- Add i18n translations (en/zh) for voice settings
- Add SPEECHMATICS_API_KEY env var in docker-compose

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 22:13:18 -08:00
hailin da17488389 feat: voice mode event filtering — skip tool/thinking events for Agent SDK
1. Remove on_enter greeting entirely (no more race condition)
2. voice-agent sends voiceMode: true when engine_type is claude_agent_sdk
3. AgentController.runTaskStream() filters thinking, tool_use, tool_result
   events in voice mode — only text, completed, error reach the client
4. Detailed logging: each event logged with [FILTERED-voice] tag when skipped

Claude API mode is completely unaffected (voiceMode defaults to false).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 02:56:41 -08:00
hailin e4c2505048 feat: add multimodal image input with streaming markdown optimization
Two major features in this commit:

1. Streaming Markdown Rendering Optimization
   - Replace deprecated flutter_markdown with gpt_markdown (active, AI-optimized)
   - Real-time markdown rendering during streaming (was showing raw syntax)
   - Solid block cursor (█) instead of AnimationController blink
   - 80ms token throttle buffer reducing rebuilds from per-token to ~12.5/sec
   - RepaintBoundary isolation for markdown widget repaints
   - StreamTextWidget simplified from StatefulWidget to StatelessWidget

2. Multimodal Image Input (camera + gallery + display)
   - Flutter: image_picker for gallery/camera, base64 encoding, attachment
     preview strip with delete, thumbnails in sent messages
   - Data layer: List<String>? → List<Map<String, dynamic>>? for structured
     attachment payloads through datasource/repository/usecase
   - ChatAttachment model with base64Data, mediaType, fileName
   - ChatMessage entity + ChatMessageModel both support attachments field
   - Backend DTO, Entity (JSONB), Controller, ConversationContextService
     all extended to receive, store, and reconstruct Anthropic image
     content blocks in loadContext()
   - Claude API engine skips duplicate user message when history already
     ends with multimodal content blocks
   - NestJS body parser limit raised to 10MB for base64 image payloads
   - Android CAMERA permission added to manifest
   - Image.memory uses cacheWidth/cacheHeight for memory efficiency
   - Max 5 images per message enforced in UI

Data flow:
  ImagePicker → base64Encode → ChatAttachment → POST body →
  DB (JSONB) → loadContext → Anthropic image content blocks → Claude API

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 03:24:17 -08:00
hailin 50dbb641a3 fix: comprehensive hardening of agent task cancel/inject/approve flows
6 rounds of systematic audit identified and fixed 14 bugs across
backend controller and Flutter client:

## Backend (agent.controller.ts)

Security & Tenant Isolation:
- Add @TenantId + ForbiddenException check to cancelTask, injectMessage,
  approveCommand — all 4 write endpoints now enforce tenant isolation
- Add tenantId check on session reuse in executeTask to prevent
  cross-tenant session hijacking

Architecture & Correctness:
- Extract shared runTaskStream() from inline fire-and-forget block,
  used by both executeTask and injectMessage to reduce duplication
- Use session.engineType (not getActiveEngine()) in cancelTask,
  injectMessage, approveCommand — fixes wrong-engine-cancel when
  global engine config is switched after task creation
- Add concurrent task prevention: executeTask checks for existing
  RUNNING task on same session and cancels it before starting new one
- Add runningTasks Map to track task promises, awaitTaskCleanup()
  helper with 3s timeout for inject to wait for partial text save
- captureSdkSessionId() captures SDK session ID into metadata
  without DB save (callers persist), preventing fire-and-forget race

Cancel/Reject Improvements:
- cancelTask: idempotent (returns early if already CANCELLED/COMPLETED),
  session stays 'active' (was 'cancelled'), emits cancelled WS event
- approveCommand reject: session stays 'active' (was 'cancelled'),
  now emits cancelled WS event so Flutter stream listeners clean up
- approveCommand approved: collect text events and save assistant
  response to conversation history on completion (was missing)

Minor:
- task.result! non-null assertion → task.result ?? 'Unknown error'
- Add findRunningBySessionId() to TaskRepository

## Flutter

API Contract Fix:
- approveCommand: route changed from /api/v1/ops/approvals/:id/approve
  to /api/v1/agent/tasks/:id/approve with {approved: true} body
- rejectCommand: route changed from /api/v1/ops/approvals/:id/reject
  to /api/v1/agent/tasks/:id/approve with {approved: false} body

Resource Management:
- ChatNotifier.dispose() now disconnects WebSocket to prevent
  connection leak when navigating away from chat

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 22:20:46 -08:00
hailin cc0f06e2be feat: SDK engine native resume with per-tenant HOME isolation
Replace prompt-prefix workaround with SDK's native resume mechanism.
Each tenant gets isolated HOME directory (/data/claude-tenants/{tenantId})
to prevent cross-tenant session file mixing. SDK session IDs are persisted
in session.metadata for cross-request resume support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 02:27:38 -08:00
hailin 2403ce5636 feat: multi-turn conversation context management with session history UI
Implement DB-based conversation message storage (engine-agnostic) that
works across both Claude API and Agent SDK engines. Add ChatGPT/Claude-style
conversation history drawer in Flutter with date-grouped session list,
session switching, and new chat functionality.

Backend: entity, repository, context service, migration 004, session/message
API endpoints. Flutter: ConversationDrawer, sessionId flow from backend
response via SessionInfoEvent, session list/switch/delete support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 19:04:35 -08:00
hailin 5d4fd96d43 feat: streaming claude-api engine, engineType override, fix voice test page
- Claude API engine now uses streaming API (messages.stream) for real-time
  text delta output instead of waiting for full response
- Agent controller accepts optional engineType body parameter to allow
  callers (e.g. voice pipeline) to select a specific engine
- Fix voice_test_page.dart compilation error: replace audioplayers (not
  installed) with flutter_sound (already in pubspec.yaml)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 05:30:11 -08:00
hailin 2a150dcff5 fix: prevent error event from overriding completed status in controller
Add finished guard so that once a task reaches completed/error terminal
state, subsequent events don't flip the status back.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 03:49:21 -08:00
hailin a7b42e6b98 feat: add detailed logging to agent engine and task controller
Log every SDK message type, event emission, and stream lifecycle
to diagnose why text events are missing in voice-agent flow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 02:56:09 -08:00
hailin 1d5c834dfe feat: add event buffering to agent WS gateway for late subscribers
Buffer stream events when no WS clients are subscribed yet, then replay
them when a client subscribes. This eliminates the race condition where
events are lost between task creation and WS subscription.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 02:41:38 -08:00
hailin 86d7cac631 fix: replace Socket.IO with raw WebSocket to fix 502 on /ws/agent
Socket.IO requires its own handshake protocol (EIO=4) which Kong cannot
proxy as a plain WebSocket upgrade, causing 502 Bad Gateway. Switch to
@nestjs/platform-ws (WsAdapter) with manual session room tracking so
Flutter's IOWebSocketChannel can connect directly.

Also add ws/wss protocols to Kong WebSocket routes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 16:52:43 -08:00
hailin 806113554b fix: remove AuthGuard('jwt') from all service controllers
Kong handles JWT validation at the gateway level. Service-level
AuthGuard('jwt') fails because services don't register a Passport
JWT strategy (only auth-service does). Removed from 17 controllers
across ops, inventory, monitor, comm, audit, and agent services.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 23:42:37 -08:00
hailin d8cb2a9c6f fix: use standard TypeORM repos and header-based tenant extraction
- Replace TenantAwareRepository with standard @InjectRepository
  (TenantAwareRepository requires AsyncLocalStorage tenant context
  middleware which agent-service does not have)
- Replace @TenantId() decorator with @Headers('x-tenant-id')
  for direct HTTP header extraction
- Return defaults gracefully when no tenant is selected

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 22:41:30 -08:00
hailin f897cfe240 fix: remove AuthGuard('jwt') from agent-service controllers
Agent-service does not have a registered Passport JWT strategy —
JWT validation is handled by Kong API gateway. The AuthGuard was
causing 500 "Unknown authentication strategy" errors on all
new controller endpoints.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 22:36:46 -08:00
hailin 5ee1227800 feat: add backend controllers for agent config, skills, and hooks
Implement missing REST API endpoints that the web-admin frontend
pages were calling but had no backend support:

- GET/POST/PUT /api/v1/agent-config (engine, prompt, turns, budget, tools)
- GET/POST/PUT/DELETE /api/v1/agent/skills (CRUD for agent skills)
- GET/POST/PUT/DELETE /api/v1/agent/hooks (CRUD for hook scripts)

Each endpoint includes entity, repository, service, and controller
layers following the existing DDD + tenant-aware patterns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 22:26:25 -08:00
hailin c75ad27771 feat: add Claude Agent SDK engine with multi-tenant support
Add @anthropic-ai/claude-agent-sdk as a third engine (pure additive, no changes
to existing CLI/API engines). Includes full frontend admin page.

Backend (agent-service):
- ClaudeAgentSdkEngine: implements AgentEnginePort using SDK's query() API
- ApprovalGate: L2 tool approval with configurable auto-approve timeout (default 120s)
- TenantAgentConfig entity: per-tenant billing mode, encrypted API key, timeout, tool lists
- AllowedToolsResolverService: RBAC-based tool whitelist (admin/operator/viewer)
- TenantAgentConfigController: REST endpoints for admin config management
- Default subscription billing (operator's Claude login, no API key needed)
- Optional per-tenant API key with AES-256-GCM encryption

Frontend (web-admin):
- SDK Config page at /agent-config/sdk with billing, timeout, tool permissions
- Sidebar navigation entry under Agent Config
- React Query key for tenant SDK config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 18:38:30 -08:00
hailin 00f8801d51 Initial commit: IT0 AI-powered server cluster operations platform
Full-stack monorepo with DDD + Clean Architecture:
- Backend: 7 NestJS microservices + 5 shared libraries (TypeScript)
- Mobile: Flutter app with Riverpod (Dart)
- Web Admin: Next.js dashboard with Zustand + React Query
- Voice: Python voice service (STT/TTS/VAD)
- Infra: Docker Compose, K8s manifests, Turborepo build

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 22:54:37 -08:00