hailin/it0 - it0 - AI Wolves Team

Commit Graph

Author	SHA1	Message	Date
hailin	2182149c4c	feat(chat): voice-to-text fills input box instead of auto-sending - Add POST /api/v1/agent/transcribe endpoint (STT only, no agent trigger) - Add transcribeAudio() to chat datasource and provider - VoiceMicButton now fills the text input field with transcript; user reviews and sends manually - Add OPENAI_API_KEY/OPENAI_BASE_URL to agent-service in docker-compose Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 07:01:39 -08:00
hailin	a2af76bcd7	feat(agent-service): add voice message endpoint with Whisper STT and async interrupt New endpoint: POST /api/v1/agent/sessions/:sessionId/voice-message - Accepts multipart/form-data audio file (any format Whisper supports) - Transcribes via OpenAI Whisper API (routed through existing proxy) - If a task is currently running in the session → hard-interrupts it first (same cancel+inject pattern as text inject, triggered by voice command) - Otherwise → starts a fresh task with the transcript - Returns { sessionId, taskId, transcript } so client can subscribe to WS stream This enables WhatsApp-style push-to-talk and doubles as an async voice interrupt into any active agent workflow, bypassing the need for speaker diarization (whoever presses record owns the message). New files: infrastructure/stt/openai-stt.service.ts — OpenAI Whisper client, manually builds multipart/form-data, supports self-signed proxy cert Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 03:12:03 -08:00
hailin	635cca18fa	feat(voice): long-lived agent session with proper hangup termination Replace the per-turn POST /tasks approach for voice calls with a long-lived agent run loop tied to the call lifecycle: agent-service: - Add AsyncQueue<T> utility for blocking message relay - Add VoiceSessionManager: spawns one background run loop per voice call, accepts injected messages, terminates cleanly on hangup - Add VoiceSessionController with 3 endpoints: POST /api/v1/agent/sessions/voice/start (call start) POST /api/v1/agent/sessions/:id/voice/inject (each speech turn) DELETE /api/v1/agent/sessions/:id/voice (user hung up) - Register VoiceSessionManager + VoiceSessionController in agent.module.ts voice-agent: - AgentServiceLLM: add start_voice_session(), terminate_voice_session(), inject_text_message() (voice/inject-aware), _do_inject_voice() - AgentServiceLLMStream._run(): use voice/inject path when voice session is active; fall back to per-task POST for text-chat / non-SDK engines - entrypoint(): call start_voice_session() after session.start(); register _on_room_disconnect that calls terminate_voice_session() so the agent is always killed when the user hangs up Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-04 04:01:02 -08:00
hailin	6ca8aab243	fix(agent-service): store proper title in session metadata, exclude systemPrompt from list API Two issues fixed: 1. agent.controller.ts — on the FIRST task of each session, write title+voiceMode into session.metadata so the client can display a meaningful conversation title: - Text sessions: metadata.title = first 40 chars of user prompt - Voice sessions: metadata.title = '' + metadata.voiceMode = true (Flutter renders these as '语音对话 M/D HH:mm') titleSet flag prevents overwriting the title on subsequent turns of the same session. 2. session.controller.ts — listSessions() now returns a DTO instead of the raw entity. systemPrompt is an internal engine instruction and is explicitly excluded from the response. The client receives { id, status, engineType, metadata, createdAt, updatedAt }.	2026-03-04 02:39:47 -08:00
hailin	9ed80cd0bc	feat: implement complete commercial monetization loop (Phases 1-4) ## Phase 1 - Token Metering + Quota Enforcement ### Usage Tracking - agent-service: add UsageRecord entity (per-tenant schema) tracking inputTokens/outputTokens/costUsd per AI task - Modify all 3 AI engines (claude-api, claude-code-cli, claude-agent-sdk) to emit separate input/output token counts in the `completed` event - claude-api-engine: costUsd = (input3 + output15) / 1,000,000 (claude-sonnet-4-5 pricing: $3/MTok in, $15/MTok out) - agent.controller: persist UsageRecord and publish `usage.recorded` event to Redis Streams on every task completion (non-blocking) - shared/events: new events UsageRecordedEvent, SubscriptionChangedEvent, QuotaExceededEvent, PaymentReceivedEvent ### Quota Enforcement - TenantInfo: add maxServers, maxUsers, maxStandingOrders, maxAgentTokensPerMonth fields - TenantContextMiddleware: rewritten to query public.tenants table for real quota values; 5-min in-memory cache; plan-based fallback on error - TenantContextService: getTenant() returns null instead of throwing; added getTenantOrThrow() for strict callers - inventory-service/server.controller: 429 when maxServers exceeded - ops-service/standing-order.controller: 429 when maxStandingOrders exceeded - auth-service/auth.service: 429 when maxUsers exceeded - 002-create-tenant-schema-template.sql: add usage_records table ## Phase 2 - billing-service (New Microservice, port 3010) ### Domain Layer (public schema, all UUIDs) Entities: Plan, Subscription, Invoice, InvoiceItem, Payment, PaymentMethod, UsageAggregate Domain services: - SubscriptionLifecycleService: full state machine (trialing -> active -> past_due -> cancelled/expired); upgrades immediate, downgrades at period end - InvoiceGeneratorService: monthly invoice = base fee + overage charges; proration item for mid-cycle upgrades - OverageCalculatorService: (totalTokens - includedTokens) * overageRate ### Infrastructure (all repos use DataSource directly, NOT TenantAwareRepository) - PlanRepository, SubscriptionRepository, InvoiceRepository (atomic transaction for invoice+items), PaymentRepository (payments + methods), UsageAggregateRepository (UPSERT via ON CONFLICT for atomic accumulation) ### Application Use Cases - CreateSubscriptionUseCase: called on tenant registration - ChangePlanUseCase: upgrade (immediate + proration) or downgrade (scheduled) - CancelSubscriptionUseCase: immediate or at-period-end - GenerateMonthlyInvoiceUseCase: cron target (1st of month 00:05 UTC); generates invoices, renews periods, applies scheduled downgrades - AggregateUsageUseCase: Redis Streams consumer group billing-service, upserts monthly usage aggregates from usage.recorded events - CheckTokenQuotaUseCase: hard limit enforcement per plan - CreatePaymentSessionUseCase + HandlePaymentWebhookUseCase ### REST API - GET /api/v1/billing/plans - GET/POST /api/v1/billing/subscription (+ /upgrade, /cancel) - GET /api/v1/billing/invoices (paginated) - GET /api/v1/billing/invoices/:id - POST /api/v1/billing/invoices/:id/pay - GET /api/v1/billing/usage/current + /history - CRUD /api/v1/billing/payment-methods - POST /api/v1/billing/webhooks/{stripe,alipay,wechat,crypto} ### Plan Seed (auto on startup via PlanSeedService) - free: $0/mo, 100K tokens, no overage, hard limit 100% - pro: $49.99/mo, 1M tokens, $8/MTok, hard limit 150% - enterprise: $199.99/mo, 10M tokens, $5/MTok, no hard limit ## Phase 3 - Payment Provider Integration ### PaymentProviderRegistry (Strategy Pattern, mirrors EngineRegistry) All providers use @Optional() injection; unconfigured providers omitted - StripeProvider: PaymentIntent API; webhook via stripe.webhooks.constructEvent - AlipayProvider: alipay-sdk; Native QR (precreate); RSA2 signature verify - WeChatPayProvider: v3 REST; Native Pay code_url; AES-256-GCM decrypt; HMAC-SHA256 request signing and webhook verification - CryptoProvider: Coinbase Commerce; hosted checkout; HMAC-SHA256 verify ### WebhookController All 4 webhook endpoints are public (no JWT) for payment provider callbacks. rawBody: true enabled in main.ts for signature verification. ## Infrastructure Changes - docker-compose.yml: billing-service container (port 13010); added as dependency of api-gateway - kong.yml: /api/v1/billing routes (JWT); /api/v1/billing/webhooks (public) - 005-create-billing-tables.sql: 7 billing tables + invoice sequence + ALTER tenants to add quota columns - run-migrations.ts: 005 runs as part of shared schema step ## Phase 4 - Frontend ### Web Admin (Next.js) New pages: - /billing: subscription card + token usage bar + warning banner + invoices - /billing/plans: comparison grid with USD/CNY toggle + upgrade/downgrade flow - /billing/invoices: paginated table with Pay Now button Sidebar: Billing group (CreditCard icon, 3 sub-items) i18n: billing keys added to en + zh sidebar translations ### Flutter App New feature module it0_app/lib/features/billing/: - BillingOverviewPage: plan card + token LinearProgressIndicator + latest invoice + upgrade button - BillingProvider (FutureProvider): parallel fetch subscription/quota/invoice Settings page: "订阅与用量" entry card Router: /settings/billing sub-route Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-03 21:09:17 -08:00
hailin	7fb0d1de95	refactor: remove Speechmatics STT integration entirely, default to OpenAI - Delete speechmatics_stt.py plugin - Remove speechmatics branch from voice-agent entrypoint - Remove livekit-plugins-speechmatics dependency - Change default stt_provider to 'openai' in entity, controller, and UI - Remove SPEECHMATICS_API_KEY from docker-compose.yml - Remove speechmatics option from web-admin settings dropdown Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 04:58:38 -08:00
hailin	e32a3a9800	fix: use @TenantId() decorator in VoiceConfigController for JWT tenant extraction Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 22:30:37 -08:00
hailin	f9c47de04b	feat: add STT provider switching (OpenAI ↔ Speechmatics) in settings - Add VoiceConfig entity/repo/service/controller in agent-service for per-tenant STT provider persistence (default: speechmatics) - Add Speechmatics STT plugin in voice-agent with livekit-plugins-speechmatics - Modify voice-agent entrypoint for 3-way STT selection: metadata > agent-service config > env var fallback - Add "Voice" section in web-admin settings page with STT provider dropdown - Add i18n translations (en/zh) for voice settings - Add SPEECHMATICS_API_KEY env var in docker-compose Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 22:13:18 -08:00
hailin	da17488389	feat: voice mode event filtering — skip tool/thinking events for Agent SDK 1. Remove on_enter greeting entirely (no more race condition) 2. voice-agent sends voiceMode: true when engine_type is claude_agent_sdk 3. AgentController.runTaskStream() filters thinking, tool_use, tool_result events in voice mode — only text, completed, error reach the client 4. Detailed logging: each event logged with [FILTERED-voice] tag when skipped Claude API mode is completely unaffected (voiceMode defaults to false). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 02:56:41 -08:00
hailin	e4c2505048	feat: add multimodal image input with streaming markdown optimization Two major features in this commit: 1. Streaming Markdown Rendering Optimization - Replace deprecated flutter_markdown with gpt_markdown (active, AI-optimized) - Real-time markdown rendering during streaming (was showing raw syntax) - Solid block cursor (█) instead of AnimationController blink - 80ms token throttle buffer reducing rebuilds from per-token to ~12.5/sec - RepaintBoundary isolation for markdown widget repaints - StreamTextWidget simplified from StatefulWidget to StatelessWidget 2. Multimodal Image Input (camera + gallery + display) - Flutter: image_picker for gallery/camera, base64 encoding, attachment preview strip with delete, thumbnails in sent messages - Data layer: List<String>? → List<Map<String, dynamic>>? for structured attachment payloads through datasource/repository/usecase - ChatAttachment model with base64Data, mediaType, fileName - ChatMessage entity + ChatMessageModel both support attachments field - Backend DTO, Entity (JSONB), Controller, ConversationContextService all extended to receive, store, and reconstruct Anthropic image content blocks in loadContext() - Claude API engine skips duplicate user message when history already ends with multimodal content blocks - NestJS body parser limit raised to 10MB for base64 image payloads - Android CAMERA permission added to manifest - Image.memory uses cacheWidth/cacheHeight for memory efficiency - Max 5 images per message enforced in UI Data flow: ImagePicker → base64Encode → ChatAttachment → POST body → DB (JSONB) → loadContext → Anthropic image content blocks → Claude API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 03:24:17 -08:00
hailin	50dbb641a3	fix: comprehensive hardening of agent task cancel/inject/approve flows 6 rounds of systematic audit identified and fixed 14 bugs across backend controller and Flutter client: ## Backend (agent.controller.ts) Security & Tenant Isolation: - Add @TenantId + ForbiddenException check to cancelTask, injectMessage, approveCommand — all 4 write endpoints now enforce tenant isolation - Add tenantId check on session reuse in executeTask to prevent cross-tenant session hijacking Architecture & Correctness: - Extract shared runTaskStream() from inline fire-and-forget block, used by both executeTask and injectMessage to reduce duplication - Use session.engineType (not getActiveEngine()) in cancelTask, injectMessage, approveCommand — fixes wrong-engine-cancel when global engine config is switched after task creation - Add concurrent task prevention: executeTask checks for existing RUNNING task on same session and cancels it before starting new one - Add runningTasks Map to track task promises, awaitTaskCleanup() helper with 3s timeout for inject to wait for partial text save - captureSdkSessionId() captures SDK session ID into metadata without DB save (callers persist), preventing fire-and-forget race Cancel/Reject Improvements: - cancelTask: idempotent (returns early if already CANCELLED/COMPLETED), session stays 'active' (was 'cancelled'), emits cancelled WS event - approveCommand reject: session stays 'active' (was 'cancelled'), now emits cancelled WS event so Flutter stream listeners clean up - approveCommand approved: collect text events and save assistant response to conversation history on completion (was missing) Minor: - task.result! non-null assertion → task.result ?? 'Unknown error' - Add findRunningBySessionId() to TaskRepository ## Flutter API Contract Fix: - approveCommand: route changed from /api/v1/ops/approvals/:id/approve to /api/v1/agent/tasks/:id/approve with {approved: true} body - rejectCommand: route changed from /api/v1/ops/approvals/:id/reject to /api/v1/agent/tasks/:id/approve with {approved: false} body Resource Management: - ChatNotifier.dispose() now disconnects WebSocket to prevent connection leak when navigating away from chat Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 22:20:46 -08:00
hailin	cc0f06e2be	feat: SDK engine native resume with per-tenant HOME isolation Replace prompt-prefix workaround with SDK's native resume mechanism. Each tenant gets isolated HOME directory (/data/claude-tenants/{tenantId}) to prevent cross-tenant session file mixing. SDK session IDs are persisted in session.metadata for cross-request resume support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 02:27:38 -08:00
hailin	2403ce5636	feat: multi-turn conversation context management with session history UI Implement DB-based conversation message storage (engine-agnostic) that works across both Claude API and Agent SDK engines. Add ChatGPT/Claude-style conversation history drawer in Flutter with date-grouped session list, session switching, and new chat functionality. Backend: entity, repository, context service, migration 004, session/message API endpoints. Flutter: ConversationDrawer, sessionId flow from backend response via SessionInfoEvent, session list/switch/delete support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 19:04:35 -08:00
hailin	5d4fd96d43	feat: streaming claude-api engine, engineType override, fix voice test page - Claude API engine now uses streaming API (messages.stream) for real-time text delta output instead of waiting for full response - Agent controller accepts optional engineType body parameter to allow callers (e.g. voice pipeline) to select a specific engine - Fix voice_test_page.dart compilation error: replace audioplayers (not installed) with flutter_sound (already in pubspec.yaml) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 05:30:11 -08:00
hailin	2a150dcff5	fix: prevent error event from overriding completed status in controller Add finished guard so that once a task reaches completed/error terminal state, subsequent events don't flip the status back. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:49:21 -08:00
hailin	a7b42e6b98	feat: add detailed logging to agent engine and task controller Log every SDK message type, event emission, and stream lifecycle to diagnose why text events are missing in voice-agent flow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 02:56:09 -08:00
hailin	1d5c834dfe	feat: add event buffering to agent WS gateway for late subscribers Buffer stream events when no WS clients are subscribed yet, then replay them when a client subscribes. This eliminates the race condition where events are lost between task creation and WS subscription. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 02:41:38 -08:00
hailin	86d7cac631	fix: replace Socket.IO with raw WebSocket to fix 502 on /ws/agent Socket.IO requires its own handshake protocol (EIO=4) which Kong cannot proxy as a plain WebSocket upgrade, causing 502 Bad Gateway. Switch to @nestjs/platform-ws (WsAdapter) with manual session room tracking so Flutter's IOWebSocketChannel can connect directly. Also add ws/wss protocols to Kong WebSocket routes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 16:52:43 -08:00
hailin	806113554b	fix: remove AuthGuard('jwt') from all service controllers Kong handles JWT validation at the gateway level. Service-level AuthGuard('jwt') fails because services don't register a Passport JWT strategy (only auth-service does). Removed from 17 controllers across ops, inventory, monitor, comm, audit, and agent services. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 23:42:37 -08:00
hailin	d8cb2a9c6f	fix: use standard TypeORM repos and header-based tenant extraction - Replace TenantAwareRepository with standard @InjectRepository (TenantAwareRepository requires AsyncLocalStorage tenant context middleware which agent-service does not have) - Replace @TenantId() decorator with @Headers('x-tenant-id') for direct HTTP header extraction - Return defaults gracefully when no tenant is selected Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 22:41:30 -08:00
hailin	f897cfe240	fix: remove AuthGuard('jwt') from agent-service controllers Agent-service does not have a registered Passport JWT strategy — JWT validation is handled by Kong API gateway. The AuthGuard was causing 500 "Unknown authentication strategy" errors on all new controller endpoints. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 22:36:46 -08:00
hailin	5ee1227800	feat: add backend controllers for agent config, skills, and hooks Implement missing REST API endpoints that the web-admin frontend pages were calling but had no backend support: - GET/POST/PUT /api/v1/agent-config (engine, prompt, turns, budget, tools) - GET/POST/PUT/DELETE /api/v1/agent/skills (CRUD for agent skills) - GET/POST/PUT/DELETE /api/v1/agent/hooks (CRUD for hook scripts) Each endpoint includes entity, repository, service, and controller layers following the existing DDD + tenant-aware patterns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 22:26:25 -08:00
hailin	c75ad27771	feat: add Claude Agent SDK engine with multi-tenant support Add @anthropic-ai/claude-agent-sdk as a third engine (pure additive, no changes to existing CLI/API engines). Includes full frontend admin page. Backend (agent-service): - ClaudeAgentSdkEngine: implements AgentEnginePort using SDK's query() API - ApprovalGate: L2 tool approval with configurable auto-approve timeout (default 120s) - TenantAgentConfig entity: per-tenant billing mode, encrypted API key, timeout, tool lists - AllowedToolsResolverService: RBAC-based tool whitelist (admin/operator/viewer) - TenantAgentConfigController: REST endpoints for admin config management - Default subscription billing (operator's Claude login, no API key needed) - Optional per-tenant API key with AES-256-GCM encryption Frontend (web-admin): - SDK Config page at /agent-config/sdk with billing, timeout, tool permissions - Sidebar navigation entry under Agent Config - React Query key for tenant SDK config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 18:38:30 -08:00
hailin	00f8801d51	Initial commit: IT0 AI-powered server cluster operations platform Full-stack monorepo with DDD + Clean Architecture: - Backend: 7 NestJS microservices + 5 shared libraries (TypeScript) - Mobile: Flutter app with Riverpod (Dart) - Web Admin: Next.js dashboard with Zustand + React Query - Voice: Python voice service (STT/TTS/VAD) - Infra: Docker Compose, K8s manifests, Turborepo build Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 22:54:37 -08:00

24 Commits