Commit Graph

184 Commits

Author SHA1 Message Date
hailin df3b1a6ec6 fix(billing-service): fix entity table names (billing_ prefix) and column mappings to match migration 2026-03-04 01:47:56 -08:00
hailin d96ea91815 fix(ops-service): add new TenantInfo quota fields to inline TenantContextService.run calls 2026-03-04 00:04:36 -08:00
hailin ffe06fab7a fix(billing-service): add tsconfig with workspace path aliases
The billing-service tsconfig.json was missing the TypeScript path aliases
required for the workspace build (turbo builds shared packages first, then
resolves @it0/* via paths). Without these, nest build fails with
'Cannot find module @it0/database'.

Also disables overly strict checks (strictNullChecks, strictPropertyInitialization,
useUnknownInCatchVariables) to match the lenient settings used by other services.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 23:32:34 -08:00
hailin 40ee84a0b7 fix(billing-service): resolve all TypeScript compilation errors
Comprehensive fix of 124 TS errors across the billing-service:

Entity fixes:
- invoice.entity.ts: add InvoiceStatus/InvoiceCurrency const objects,
  rename fields to match DB schema (subtotalCents, taxCents, totalCents,
  amountDueCents), add OneToMany items relation
- invoice-item.entity.ts: add InvoiceItemType const object, add column
  name mappings and currency field
- payment.entity.ts: add PaymentStatus const, rename amount→amountCents
  with column name mapping, add paidAt field
- subscription.entity.ts: add SubscriptionStatus const object
- usage-aggregate.entity.ts: rename periodYear/Month→year/month to match
  DB columns, add periodStart/periodEnd fields
- payment-method.entity.ts: add displayName, expiresAt, updatedAt fields

Port/Provider fixes:
- payment-provider.port.ts: make PaymentProviderType a const object (not
  just a type), add PaymentSessionRequest alias, rename WebhookEvent with
  correct field shape (type vs eventType), make providerPaymentId optional
- All 4 providers: replace PaymentSessionRequest→CreatePaymentParams,
  fix amountCents→amount, remove sessionId from PaymentSession return,
  add confirmPayment() stub, fix Stripe API version to '2023-10-16'

Use case fixes:
- aggregate-usage.use-case.ts: replace 'redis' with 'ioredis' (workspace
  standard); rewrite using ioredis xreadgroup API
- change/check/generate use cases: fix Plan field names
  (monthlyPriceCentsUsd, includedTokens, overageRateCentsPerMTokenUsd)
- generate-monthly-invoice: fix SubscriptionStatus/InvoiceCurrency as
  values (now const objects)
- handle-payment-webhook: fix WebhookResult import, result.type usage,
  payment.paidAt

Controller/Repository fixes:
- plan.controller.ts, plan.repository.ts: fix Plan field names
- webhook.controller.ts: remove express import, use any for req type
- invoice-generator.service.ts: fix overageAmountCents→overageCentsUsd,
  monthlyPriceCny→monthlyPriceFenCny, includedTokensPerMonth→includedTokens

Dependencies:
- billing-service/package.json: replace redis with ioredis dependency
- pnpm-lock.yaml: regenerated after ioredis addition

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 23:00:27 -08:00
hailin 9ed80cd0bc feat: implement complete commercial monetization loop (Phases 1-4)
## Phase 1 - Token Metering + Quota Enforcement

### Usage Tracking
- agent-service: add UsageRecord entity (per-tenant schema) tracking
  inputTokens/outputTokens/costUsd per AI task
- Modify all 3 AI engines (claude-api, claude-code-cli, claude-agent-sdk)
  to emit separate input/output token counts in the `completed` event
- claude-api-engine: costUsd = (input*3 + output*15) / 1,000,000
  (claude-sonnet-4-5 pricing: $3/MTok in, $15/MTok out)
- agent.controller: persist UsageRecord and publish `usage.recorded`
  event to Redis Streams on every task completion (non-blocking)
- shared/events: new events UsageRecordedEvent, SubscriptionChangedEvent,
  QuotaExceededEvent, PaymentReceivedEvent

### Quota Enforcement
- TenantInfo: add maxServers, maxUsers, maxStandingOrders,
  maxAgentTokensPerMonth fields
- TenantContextMiddleware: rewritten to query public.tenants table for
  real quota values; 5-min in-memory cache; plan-based fallback on error
- TenantContextService: getTenant() returns null instead of throwing;
  added getTenantOrThrow() for strict callers
- inventory-service/server.controller: 429 when maxServers exceeded
- ops-service/standing-order.controller: 429 when maxStandingOrders exceeded
- auth-service/auth.service: 429 when maxUsers exceeded
- 002-create-tenant-schema-template.sql: add usage_records table

## Phase 2 - billing-service (New Microservice, port 3010)

### Domain Layer (public schema, all UUIDs)
Entities: Plan, Subscription, Invoice, InvoiceItem, Payment, PaymentMethod,
UsageAggregate

Domain services:
- SubscriptionLifecycleService: full state machine (trialing -> active ->
  past_due -> cancelled/expired); upgrades immediate, downgrades at period end
- InvoiceGeneratorService: monthly invoice = base fee + overage charges;
  proration item for mid-cycle upgrades
- OverageCalculatorService: (totalTokens - includedTokens) * overageRate

### Infrastructure (all repos use DataSource directly, NOT TenantAwareRepository)
- PlanRepository, SubscriptionRepository, InvoiceRepository (atomic
  transaction for invoice+items), PaymentRepository (payments + methods),
  UsageAggregateRepository (UPSERT via ON CONFLICT for atomic accumulation)

### Application Use Cases
- CreateSubscriptionUseCase: called on tenant registration
- ChangePlanUseCase: upgrade (immediate + proration) or downgrade (scheduled)
- CancelSubscriptionUseCase: immediate or at-period-end
- GenerateMonthlyInvoiceUseCase: cron target (1st of month 00:05 UTC);
  generates invoices, renews periods, applies scheduled downgrades
- AggregateUsageUseCase: Redis Streams consumer group billing-service,
  upserts monthly usage aggregates from usage.recorded events
- CheckTokenQuotaUseCase: hard limit enforcement per plan
- CreatePaymentSessionUseCase + HandlePaymentWebhookUseCase

### REST API
- GET  /api/v1/billing/plans
- GET/POST /api/v1/billing/subscription (+ /upgrade, /cancel)
- GET  /api/v1/billing/invoices (paginated)
- GET  /api/v1/billing/invoices/:id
- POST /api/v1/billing/invoices/:id/pay
- GET  /api/v1/billing/usage/current + /history
- CRUD /api/v1/billing/payment-methods
- POST /api/v1/billing/webhooks/{stripe,alipay,wechat,crypto}

### Plan Seed (auto on startup via PlanSeedService)
- free:       $0/mo,    100K tokens,  no overage,  hard limit 100%
- pro:        $49.99/mo, 1M tokens,  $8/MTok,  hard limit 150%
- enterprise: $199.99/mo, 10M tokens, $5/MTok, no hard limit

## Phase 3 - Payment Provider Integration

### PaymentProviderRegistry (Strategy Pattern, mirrors EngineRegistry)
All providers use @Optional() injection; unconfigured providers omitted

- StripeProvider: PaymentIntent API; webhook via stripe.webhooks.constructEvent
- AlipayProvider: alipay-sdk; Native QR (precreate); RSA2 signature verify
- WeChatPayProvider: v3 REST; Native Pay code_url; AES-256-GCM decrypt;
  HMAC-SHA256 request signing and webhook verification
- CryptoProvider: Coinbase Commerce; hosted checkout; HMAC-SHA256 verify

### WebhookController
All 4 webhook endpoints are public (no JWT) for payment provider callbacks.
rawBody: true enabled in main.ts for signature verification.

## Infrastructure Changes
- docker-compose.yml: billing-service container (port 13010);
  added as dependency of api-gateway
- kong.yml: /api/v1/billing routes (JWT); /api/v1/billing/webhooks (public)
- 005-create-billing-tables.sql: 7 billing tables + invoice sequence +
  ALTER tenants to add quota columns
- run-migrations.ts: 005 runs as part of shared schema step

## Phase 4 - Frontend

### Web Admin (Next.js)
New pages:
- /billing: subscription card + token usage bar + warning banner + invoices
- /billing/plans: comparison grid with USD/CNY toggle + upgrade/downgrade flow
- /billing/invoices: paginated table with Pay Now button
Sidebar: Billing group (CreditCard icon, 3 sub-items)
i18n: billing keys added to en + zh sidebar translations

### Flutter App
New feature module it0_app/lib/features/billing/:
- BillingOverviewPage: plan card + token LinearProgressIndicator +
  latest invoice + upgrade button
- BillingProvider (FutureProvider): parallel fetch subscription/quota/invoice
Settings page: "订阅与用量" entry card
Router: /settings/billing sub-route

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 21:09:17 -08:00
hailin 260195db50 fix(version-service): use DatabaseModule.forRoot() for correct build path
The entrypoint.sh expects dist/services/${SERVICE_NAME}/src/main, but
nest build with inline TypeORM config produces dist/main directly.
Using DatabaseModule from @it0/database forces tsc to emit the nested
path structure (since it references shared packages), matching the
entrypoint path convention used by all other services.

Also gains SnakeNamingStrategy and autoLoadEntities from the shared module.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 08:04:12 -08:00
hailin f6dffe02c5 feat: add version-service for IT0 App version management
New NestJS microservice (port 3009) providing complete version management
API for IT0 App, designed to integrate with the existing mobile-upgrade
frontend (update.szaiai.com).

Backend — packages/services/version-service/ (9 new files):
- AppVersion entity: platform (ANDROID/IOS), versionName, buildNumber,
  changelog, downloadUrl, fileSize, isForceUpdate, isEnabled, minOsVersion
- REST controller with 8 endpoints:
  GET/POST /api/v1/versions — list (with platform/disabled filters) & create
  GET/PUT/DELETE /api/v1/versions/:id — single CRUD
  PATCH /api/v1/versions/:id/toggle — enable/disable
  POST /api/v1/versions/upload — multipart APK/IPA upload (500MB limit)
  POST /api/v1/versions/parse — extract version info from APK/IPA
- File storage: /data/versions/{platform}/ via Docker volume
- APK/IPA parsing: app-info-parser package
- Database: public.app_versions table (non-tenant, platform-level)
- No JWT auth (internal version management, consistent with existing apps)

Infrastructure changes:
- Dockerfile.service: added version-service package.json COPY lines
- docker-compose.yml: version-service container (13009:3009), version_data
  volume, api-gateway depends_on
- kong.yml: version-service route (/api/v1/versions), CORS origin for
  update.szaiai.com (mobile-upgrade frontend domain)

Deployment note: nginx needs /downloads/versions/ location + client_max_body_size 500m

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 07:48:31 -08:00
hailin 26369be760 docs: add detailed comments for thinking state indicator mechanism
voice-agent agent.py:
- Module docstring explains lk.agent.state lifecycle
  (initializing → listening → thinking → speaking)
- Explains how RoomIO publishes state as participant attribute
- Documents BackgroundAudioPlayer with all available built-in clips

Flutter agent_call_page.dart:
- Documents _agentState field and all possible values
- Documents ParticipantAttributesChanged listener with UI mapping

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 06:13:54 -08:00
hailin f1d9210e1d fix: correct BackgroundAudioPlayer import path
Import from livekit.agents.voice.background_audio submodule directly,
as it's not re-exported from livekit.agents.voice.__init__.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 06:03:34 -08:00
hailin 33bd1aa3aa feat: add "thinking" state indicator for voice calls
- voice-agent: enable BackgroundAudioPlayer with keyboard typing sound
  during LLM thinking state (auto-plays when agent enters "thinking",
  stops when "speaking" starts)
- Flutter: monitor lk.agent.state participant attribute from LiveKit
  agent, show pulsing dots animation + "思考中..." text when thinking,
  avatar border changes to warning color with pulsing glow ring
- Both call mode and chat mode headers show thinking state

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 05:45:04 -08:00
hailin 121ca5a5aa docs: add Speechmatics STT postmortem — all 4 modes failed, unusable
Detailed record of why livekit-plugins-speechmatics was removed:
- EXTERNAL: no FINAL_TRANSCRIPT (framework never sends FlushSentinel)
- ADAPTIVE: zero output (dual Silero VAD conflict)
- SMART_TURN: fragments Chinese speech into tiny pieces
- FIXED: finalize() async race condition with session teardown
All tested on 2026-03-03, none viable with LiveKit agents v1.4.4.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 05:03:30 -08:00
hailin 7fb0d1de95 refactor: remove Speechmatics STT integration entirely, default to OpenAI
- Delete speechmatics_stt.py plugin
- Remove speechmatics branch from voice-agent entrypoint
- Remove livekit-plugins-speechmatics dependency
- Change default stt_provider to 'openai' in entity, controller, and UI
- Remove SPEECHMATICS_API_KEY from docker-compose.yml
- Remove speechmatics option from web-admin settings dropdown

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 04:58:38 -08:00
hailin 191ce2d6b3 fix: use FIXED mode with 1s silence trigger instead of SMART_TURN
SMART_TURN fragments continuous speech into tiny pieces, each triggering
an LLM request that aborts the previous one. FIXED mode waits for a
configurable silence duration (1.0s) before emitting FINAL_TRANSCRIPT
via the built-in END_OF_UTTERANCE handler.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 04:53:00 -08:00
hailin e8a3e07116 docs: add comprehensive Speechmatics STT integration notes
Document all findings from the integration process directly in the
source code for future reference:

1. Language code mapping: Speechmatics uses ISO 639-3 "cmn" for
   Mandarin, but LiveKit LanguageCode auto-normalizes it to "zh".
   Must override stt._stt_options.language after construction.

2. Turn detection modes (critical):
   - EXTERNAL: unusable — LiveKit never sends FlushSentinel, only
     pushes silence frames, so FINAL_TRANSCRIPT never arrives
   - ADAPTIVE: unusable — client-side Silero VAD conflicts with
     LiveKit's own VAD, produces zero transcription output
   - SMART_TURN: correct choice — server-side intelligent turn
     detection, auto-emits FINAL_TRANSCRIPT, fully compatible

3. Speaker diarization: is_active flag distinguishes primary speaker
   from TTS echo, solving the "speaker confusion" problem

4. Docker deployment: SPEECHMATICS_API_KEY in .env, watch for
   COPY layer cache when rebuilding

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 04:47:33 -08:00
hailin f30aa414dd fix: use SMART_TURN mode per Speechmatics official recommendation
Replace EXTERNAL mode + monkey-patch hack with SMART_TURN mode.
SMART_TURN uses Speechmatics server-side turn detection that properly
emits AddSegment (FINAL_TRANSCRIPT) when the user finishes speaking.
No client-side finalize or debounce timer needed.

Ref: https://docs.speechmatics.com/integrations-and-sdks/livekit

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 04:44:21 -08:00
hailin de99990c4d fix: text-based dedup to prevent duplicate FINAL_TRANSCRIPT emissions
Speechmatics re-sends identical partial segments during silence, causing
the debounce timer to fire multiple times with the same text. Each
duplicate FINAL aborts the in-flight LLM request and restarts it.

Replace time-based cooldown with text comparison: skip finalization if
the segment text matches the last finalized text. Also skip starting
new timers when partial text hasn't changed from last finalized.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 04:40:00 -08:00
hailin 3b0119fe09 fix: reduce STT latency, add cooldown dedup, enable diarization
- Reduce debounce delay from 700ms to 400ms for faster response
- Add 1.5s cooldown after emitting FINAL to prevent duplicate triggers
  that cause LLM abort/retry cycles
- Enable speaker diarization (enable_diarization=True)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 03:20:12 -08:00
hailin 8ac1884ab4 fix: use debounce timer to auto-finalize Speechmatics partial transcripts
The LiveKit framework never sends FlushSentinel to the STT stream.
Instead it pushes silence frames and waits for FINAL_TRANSCRIPT events.
In EXTERNAL turn-detection mode, Speechmatics only emits partials.

New approach: each partial transcript restarts a 700ms debounce timer.
When partials stop (user stops speaking), the timer fires and promotes
the last partial to FINAL_TRANSCRIPT, unblocking the pipeline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 03:08:17 -08:00
hailin de3eccafd0 debug: add verbose logging to Speechmatics monkey-patch
Trace _patched_process_audio lifecycle and FlushSentinel handling
to diagnose why final transcripts are not being promoted.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 02:50:04 -08:00
hailin 1431dc0c83 fix: directly promote partial transcripts to FINAL on FlushSentinel
VoiceAgentClient.finalize() schedules an async task chain that often
loses the race against session teardown. Instead, intercept partial
segments as they arrive, stash them, and synchronously emit them as
FINAL_TRANSCRIPT when FlushSentinel fires.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 02:16:46 -08:00
hailin 73fd56f30a fix: durable monkey-patch for Speechmatics finalize on flush
Move the SpeechStream._process_audio patch from container runtime
into our own source code so it survives Docker rebuilds. The patch
adds client.finalize() on FlushSentinel so EXTERNAL mode produces
final transcripts when LiveKit's VAD detects end of speech.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 02:00:42 -08:00
hailin 6707c5048d fix: use EXTERNAL mode + patch plugin to finalize on flush
EXTERNAL mode produces partial transcripts but livekit-plugins-speechmatics
does not call finalize() when receiving a flush sentinel from the framework.
A runtime monkey-patch on the plugin's SpeechStream._process_audio adds the
missing finalize() call so final transcripts are generated.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 01:58:25 -08:00
hailin 8f951ad31c fix: use turn_detection=stt for Speechmatics per official docs
Speechmatics handles end-of-utterance natively via its Voice Agent
API (ADAPTIVE mode). Use turn_detection="stt" on AgentSession so
LiveKit delegates turn boundaries to the STT engine instead of
conflicting with its own VAD-based turn detection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 01:44:10 -08:00
hailin db4e70e30c fix: use EXTERNAL turn detection for Speechmatics in LiveKit pipeline
ADAPTIVE mode enables a second client-side Silero VAD inside the
Speechmatics SDK that conflicts with LiveKit's own VAD pipeline,
causing no transcription to be returned. EXTERNAL mode delegates
turn detection to LiveKit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 01:31:33 -08:00
hailin 9daf0e3b4f fix: bypass LanguageCode normalization that maps cmn back to zh
LiveKit's LanguageCode class normalizes ISO 639-3 codes to ISO 639-1
(cmn → zh), but Speechmatics API expects "cmn" not "zh". Override
the internal _stt_options.language after construction.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 01:04:20 -08:00
hailin 7292ac6ca6 fix: use cmn instead of cmn_en for Speechmatics Voice Agent API
cmn_en bilingual code not supported by Voice Agent API, causes timeout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 00:19:50 -08:00
hailin 17ff9d3ce0 fix: use Speechmatics cmn_en bilingual model for Chinese-English mixed speech
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 23:57:26 -08:00
hailin 1d43943110 fix: correct Speechmatics STT language mapping and parameter name
- Map Whisper language codes (zh→cmn, en→en, etc.) to Speechmatics codes
- Fix parameter name: enable_partials → include_partials

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 23:56:37 -08:00
hailin e32a3a9800 fix: use @TenantId() decorator in VoiceConfigController for JWT tenant extraction
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 22:30:37 -08:00
hailin f9c47de04b feat: add STT provider switching (OpenAI ↔ Speechmatics) in settings
- Add VoiceConfig entity/repo/service/controller in agent-service
  for per-tenant STT provider persistence (default: speechmatics)
- Add Speechmatics STT plugin in voice-agent with livekit-plugins-speechmatics
- Modify voice-agent entrypoint for 3-way STT selection:
  metadata > agent-service config > env var fallback
- Add "Voice" section in web-admin settings page with STT provider dropdown
- Add i18n translations (en/zh) for voice settings
- Add SPEECHMATICS_API_KEY env var in docker-compose

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 22:13:18 -08:00
hailin 94e3153e39 chore: remove debug data_received logging 2026-03-02 06:50:41 -08:00
hailin 81e36bf859 debug: add data_received event logging to diagnose data channel 2026-03-02 06:38:02 -08:00
hailin 63b986fced fix: redesign voice call mixed-mode input with dual-layout architecture
Problem:
- Text input area caused BOTTOM OVERFLOWED BY 135 PIXELS when keyboard opened
- Input bar overlapped with call control buttons
- Sent messages were not displayed on screen (only SnackBar feedback)

Solution — split into two distinct layouts:

1. Call Mode (default):
   - Full-screen call UI: avatar, waveform, duration, large control buttons
   - Keyboard button in controls toggles to chat mode
   - No text input elements — clean voice-only interface

2. Chat Mode (tap keyboard button):
   - Compact call header: green status dot + "iAgent" + duration + inline
     mute/end/speaker/collapse controls
   - Scrollable message list (Expanded widget — properly handles keyboard)
   - User messages: right-aligned blue bubbles with attachment thumbnails
   - Agent responses: left-aligned gray bubbles with robot avatar
   - Input bar at bottom: attachment picker + text field + send button

Message display:
- User-sent text/attachments tracked in _messages list, shown as bubbles
- Agent responses sent back via LiveKit data channel (topic='text_reply')
  from voice-agent → Flutter, displayed as assistant bubbles
- Auto-scroll to latest message

Voice-agent change (agent.py):
- After session.say(response), publish response text back to Flutter via
  ctx.room.local_participant.publish_data() with topic='text_reply'
- Flutter listens for DataReceivedEvent to display agent responses

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 06:11:07 -08:00
hailin ce63ece340 feat: add mixed-mode input (text + images + files) during voice calls
Enable users to send text messages, images, and files to the Agent
while an active voice call is in progress. This addresses the case
where spoken instructions are unclear or screenshots/documents need
to be shared for analysis.

## Architecture

Data flows through LiveKit data channel (not direct HTTP):
  Flutter → publishData(topic='text_inject') → voice-agent
  → llm.inject_text_message() → POST /api/v1/agent/tasks (same session)
  → collect streamed response → session.say() → TTS playback

This preserves the constraint that voice-agent owns the agent-service
sessionId — Flutter never contacts agent-service directly.

## Flutter UI (agent_call_page.dart)
- Add keyboard toggle button to active call controls (4-button row)
- Collapsible text input area with attachment picker (+) and send button
- Attachment support: gallery multi-select, camera, file picker
  (images max 1024x1024 quality 80%, PDF supported, max 5 attachments)
- Horizontal scrolling attachment preview with delete buttons
- 200KB payload size check before LiveKit data channel send
- Layout adapts: Spacer flex 1/3 toggle, reduced bottom padding

## voice-agent (agent.py)
- Register data_received event listener after session.start()
- Filter for topic='text_inject', parse JSON payload
- Call llm.inject_text_message(text, attachments) and TTS via session.say()
- Use asyncio.ensure_future() wrapper for async handler (matches
  existing disconnect handler pattern for sync EventEmitter)

## AgentServiceLLM (agent_llm.py)
- New inject_text_message(text, attachments) method on AgentServiceLLM
- Reuses same _agent_session_id for conversation context continuity
- WS+HTTP streaming: connect, pre-subscribe, POST /tasks with
  attachments field, collect full text response, return string
- _injecting flag prevents concurrent _do_stream from clearing
  session ID on abort errors while inject is in progress
- Same systemPrompt/voiceMode/engineType as voice pipeline

No agent-service changes required — attachments already supported
end-to-end (JSONB storage → multimodal content blocks → Claude).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 05:38:04 -08:00
hailin 02aaf40bb2 fix: move voice instructions to systemPrompt, keep prompt clean
Previously, voice mode wrapped every user message with 【语音对话模式】
instructions, polluting conversation_messages history with repeated
instructions on every turn. Now:

- systemPrompt carries voice-mode instructions (set once, not per-message)
- prompt contains only the clean user text (identical to text chat pattern)
- Conversation history stays clean for multi-turn context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 03:24:50 -08:00
hailin da17488389 feat: voice mode event filtering — skip tool/thinking events for Agent SDK
1. Remove on_enter greeting entirely (no more race condition)
2. voice-agent sends voiceMode: true when engine_type is claude_agent_sdk
3. AgentController.runTaskStream() filters thinking, tool_use, tool_result
   events in voice mode — only text, completed, error reach the client
4. Detailed logging: each event logged with [FILTERED-voice] tag when skipped

Claude API mode is completely unaffected (voiceMode defaults to false).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 02:56:41 -08:00
hailin 7c9fabd891 fix: avoid Agent SDK race on greeting + clear session on abort
1. Change on_enter greeting from generate_reply() to session.say() with
   a static message — avoids spawning an Agent SDK task just for a greeting,
   which caused a race condition when the user speaks before it completes.

2. Clear agent session ID when receiving abort/exit errors so the next
   task starts a fresh session instead of trying to resume a dead process.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 02:22:52 -08:00
hailin a78e2cd923 chore: add detailed engine type logging for verification
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 02:18:29 -08:00
hailin 59a3e60b82 feat: add engine type selection (Agent SDK / Claude API) for voice calls
Full-stack implementation allowing users to choose between Claude Agent SDK
(default, with tool approval, skill injection, session resume) and Claude API
(direct, lower latency) in Flutter settings. Agent SDK mode wraps prompts with
voice-conversation instructions for concise spoken Chinese output.

Data flow: Flutter Settings → SharedPreferences → POST /livekit/token →
RoomAgentDispatch metadata → voice-agent → AgentServiceLLM(engine_type)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 02:11:51 -08:00
hailin e66c187353 fix: improve voice pipeline robustness for poor network conditions
Flutter (agent_call_page.dart):
- Add ConnectOptions with 15s timeouts for connection/peerConnection/iceRestart
- Add RoomReconnectingEvent/RoomAttemptReconnectEvent/RoomReconnectedEvent
  listeners with "网络重连中" UI indicator during reconnection
- Add TimeoutException detection in _friendlyError()

voice-agent (agent.py):
- Wrap entrypoint() in try-except with full traceback logging
- Register room disconnect listener to close httpx clients (instead of
  finally block, since session.start() returns while session runs in bg)
- Add asyncio import for ensure_future cleanup

voice-agent LLM proxy (agent_llm.py):
- Add retry with exponential backoff (max 2 retries, 1s/3s delays) for
  network errors (ConnectError/ConnectTimeout/OSError) and WS InvalidStatusCode
- Extract _do_stream() method for single-attempt logic
- Add WebSocket connection params: open_timeout=10, ping_interval=20,
  ping_timeout=10 for keepalive and faster dead-connection detection
- Use granular httpx.Timeout(connect=10, read=30, write=10, pool=10)
- Increase WS recv timeout from 5s to 30s to reduce unnecessary loops

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 23:34:55 -08:00
hailin 32922c6819 fix: adjust TTS default instructions for faster speech tempo
Changed from "语速适中" to "语速稍快,简洁干练" to reduce perceived
latency in voice conversations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 22:09:32 -08:00
hailin 186234bae2 fix: increase STT silence_duration_ms to prevent choppy transcription
Default silence_duration_ms=350 is too aggressive for Chinese speech,
causing sentences to be fragmented into 1-3 character chunks. Increase
to 800ms and raise VAD threshold to 0.6 so the STT waits longer before
finalizing a turn, producing complete sentences for LLM processing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 18:37:13 -08:00
hailin a5c95b460a fix: patch aiohttp SSL verification for OpenAI Realtime STT WebSocket
The OpenAI Realtime STT uses aiohttp WebSocket connections (not httpx),
so the existing httpx verify=False fix does not apply. LiveKit's
http_context creates aiohttp.TCPConnector without ssl=False, causing
SSL certificate verification errors when OPENAI_BASE_URL points to a
proxy with a self-signed certificate.

Monkey-patch http_context._new_session_ctx to inject ssl=False into the
aiohttp connector, fixing the "CERTIFICATE_VERIFY_FAILED" error for
Realtime STT WebSocket connections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 18:29:59 -08:00
hailin 5460be8c04 feat: add TTS voice and style settings to Flutter app
Add user-configurable TTS voice and tone style settings that flow from
the Flutter app through the backend to the voice-agent at call time.

## Flutter App (it0_app)

### Domain Layer
- app_settings.dart: Add `ttsVoice` (default: 'coral') and `ttsStyle`
  (default: '') fields to AppSettings entity with copyWith support

### Data Layer
- settings_datasource.dart: Add SharedPreferences keys
  `settings_tts_voice` and `settings_tts_style` for local persistence
  in loadSettings(), saveSettings(), and clearSettings()

### Presentation Layer
- settings_providers.dart: Add `setTtsVoice()` and `setTtsStyle()`
  methods to SettingsNotifier for Riverpod state management
- settings_page.dart: Add "语音" settings group between Notifications
  and Security groups with:
  - Voice picker: 13 OpenAI voices with gender/style labels
    (e.g. "女 · 温暖", "男 · 沉稳", "中性") in a BottomSheet
  - Style picker: 5 presets (专业干练/温柔耐心/轻松活泼/严肃正式/科幻AI)
    as ChoiceChips + custom text input field + reset button

### Call Flow
- agent_call_page.dart: Send `tts_voice` and `tts_style` in the POST
  body when requesting a LiveKit token at call initiation

## Backend

### voice-service (Python/FastAPI)
- livekit_token.py: Accept optional `tts_voice` and `tts_style` via
  Pydantic TokenRequest body model; embed them in RoomAgentDispatch
  metadata JSON alongside auth_header (backward compatible)

### voice-agent (Python/LiveKit Agents)
- agent.py: Extract `tts_voice` and `tts_style` from ctx.job.metadata;
  use them when creating openai_plugin.TTS() — user-selected voice
  overrides config default, user-selected style overrides default
  instructions. Falls back to config defaults when not provided.

## Data Flow
Flutter Settings → SharedPreferences → POST /livekit/token body →
voice-service embeds in RoomAgentDispatch metadata →
voice-agent reads from ctx.job.metadata → TTS creation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 09:38:15 -08:00
hailin 705647d732 feat: upgrade TTS to gpt-4o-mini-tts with voice instructions
- Switch from tts-1 to gpt-4o-mini-tts for lower latency and better quality
- Change voice from alloy to coral
- Add Chinese speech instructions for natural tone control

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 08:19:05 -08:00
hailin ba83e433d3 feat: enable OpenAI Realtime STT for streaming speech recognition
Switch from batch STT (gpt-4o-transcribe via /audio/transcriptions)
to streaming Realtime API (WebSocket). This eliminates the ~2s batch
upload+process latency per utterance.

Also updated nginx proxy on 67.223.119.33 to support WebSocket upgrade
for /v1/realtime endpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 07:49:25 -08:00
hailin e302891f16 fix: disable SSL verify for self-signed OpenAI proxy + handle no-user-msg
- Pass httpx.AsyncClient(verify=False) to OpenAI STT/TTS to support
  self-signed certificate on OPENAI_BASE_URL proxy
- Handle generate_reply calls with no user message by falling back to
  system/developer instructions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 21:39:49 -08:00
hailin 4d47c6a955 fix: remove wait_for_participant — room not connected in rtc_session mode
In livekit-agents v1.x @server.rtc_session() pattern, ctx.room is not
yet connected when entrypoint is called. session.start() handles room
connection internally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 21:15:37 -08:00
hailin 2112445191 fix: voice-agent crash — add room I/O options and filter AgentConfigUpdate
- Add room_input_options/room_output_options to session.start() so agent
  binds audio I/O and stays in the room
- Add wait_for_participant() before starting session
- Filter AgentConfigUpdate items in agent_llm.py (no 'role' attribute)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 21:08:07 -08:00
hailin 00be878a95 fix: refactor voice-agent to official LiveKit v1.x AgentServer pattern
Replace deprecated WorkerOptions(entrypoint_fnc=...) with AgentServer() +
@server.rtc_session() decorator. Use server.setup_fnc for prewarm. Remove
manual ctx.connect() and ctx.wait_for_participant() calls that prevented
the pipeline from properly wiring up VAD→STT→LLM→TTS.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 12:31:31 -08:00
hailin 75b14d5200 fix: use RoomOptions instead of deprecated RoomInputOptions
RoomInputOptions is deprecated in livekit-agents 1.4.x. Switch to
RoomOptions with explicit audio_input/audio_output enabled.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 11:32:36 -08:00
hailin 23b5bce983 fix: extract auth header from job.metadata instead of agent_dispatch
LiveKit passes RoomAgentDispatch metadata through as job.metadata
(protobuf field), not via a separate agent_dispatch object. Also
use room_io.RoomInputOptions for participant targeting (livekit-agents 1.x).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 11:04:02 -08:00
hailin f1d50e43f1 fix: update AgentSession.start() for livekit-agents 1.x API
livekit-agents 1.x removed the 'participant' parameter from
AgentSession.start(). Use room_input_options with participant_identity
instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 10:31:04 -08:00
hailin acfdae7773 fix: use livekit-api package for voice-service token endpoint
The livekit package is the client SDK and doesn't include the server-side
API module. Switch to livekit-api which provides AccessToken, VideoGrants,
RoomAgentDispatch, and RoomConfiguration needed for token generation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 09:49:11 -08:00
hailin 112c445143 fix: resolve websockets version conflict and use CPU-only torch
- Upgrade websockets from ==12.0 to >=13.0 (openai[realtime] requires >=13)
- Install torch CPU-only build separately in Dockerfile to avoid ~2GB CUDA download
- Remove torch from requirements.txt (installed via --index-url cpu wheel)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 09:02:31 -08:00
hailin 94a14b3104 feat: migrate voice call from WebSocket/PCM to LiveKit WebRTC
实时语音对话架构迁移:WebSocket → LiveKit WebRTC

## 背景
原语音通话架构基于 FastAPI WebSocket 传输原始 PCM,管道串行执行
(VAD → 批量STT → Agent → 攒句 → 批量TTS),首音频延迟约 6 秒。
迁移到 LiveKit Agents 框架后,利用 WebRTC 传输 + 流水线并行,
预期延迟降至 1.5-2 秒。

## 架构
Flutter App ←── WebRTC (Opus/UDP) ──→ LiveKit Server ←──→ Voice Agent
  livekit_client                      (自部署, Go)       (Python, LiveKit Agents SDK)
                                                          ├─ VAD (Silero)
                                                          ├─ STT (faster-whisper / OpenAI)
                                                          ├─ LLM (自定义插件 → agent-service)
                                                          └─ TTS (Kokoro / OpenAI)

关键设计:LLM 不直接调用 Claude API,而是通过自定义插件代理到现有
agent-service,保留 Tool Use、会话历史、租户隔离等能力。

## 新增服务

### voice-agent (packages/services/voice-agent/)
LiveKit Agent Worker,包含:
- agent.py: 入口,prewarm() 预加载模型,entrypoint() 编排会话
- plugins/agent_llm.py: 自定义 LLM 插件,代理 agent-service API
  - POST /api/v1/agent/tasks 创建任务
  - WS /ws/agent 订阅流式事件 (stream_event)
  - 跨轮复用 session_id 保持对话上下文
- plugins/whisper_stt.py: 本地 faster-whisper STT (批量识别)
- plugins/kokoro_tts.py: 本地 Kokoro-82M TTS (24kHz PCM)
- config.py: pydantic-settings 配置

### LiveKit Server (deploy/docker/)
- livekit.yaml: 信令端口 7880, RTC TCP 7881, UDP 50000-50200
- docker-compose.yml: 新增 livekit-server + voice-agent 容器

### LiveKit Token 端点
- voice-service/src/api/livekit_token.py:
  POST /api/v1/voice/livekit/token
  生成 Room JWT,嵌入 auth_header 到 AgentDispatch metadata

## Flutter 客户端改造
- agent_call_page.dart: 从 ~814 行简化到 ~380 行
  - 替换: WebSocketChannel, AudioRecorder, PcmPlayer, 手动心跳/重连
  - 使用: Room.connect(), setMicrophoneEnabled(true), LiveKit 事件监听
  - 波形动画改用 participant.audioLevel
- pubspec.yaml: 添加 livekit_client: ^2.3.0
- app_config.dart: 增加 livekitUrl 字段
- api_endpoints.dart: 增加 livekitToken 端点

## 配置说明 (环境变量)
- STT_PROVIDER: local (默认, faster-whisper) / openai
- TTS_PROVIDER: local (默认, Kokoro) / openai
- WHISPER_MODEL: base (默认) / small / medium / large
- WHISPER_LANGUAGE: zh (默认)
- KOKORO_VOICE: zf_xiaoxiao (默认)
- DEVICE: cpu (默认) / cuda

## 不变的部分
- agent-service: 完全不改,voice-agent 通过现有 API 调用
- voice-service 核心: pipeline/STT/TTS/VAD 保留 (Twilio 备用)
- Kong 网关: 现有路由不变
- 数据库: 无 schema 变更

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 08:55:33 -08:00
hailin 4987cad881 fix: increase body parser limit to 50mb for large PDF uploads
Claude API supports up to 32MB PDFs; base64 encoding adds ~33% overhead.
50mb body limit covers the maximum single-document upload case.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 05:35:43 -08:00
hailin c9367ee22a fix: PDF attachments sent as document blocks instead of image blocks
PDF files were incorrectly wrapped as type:'image' content blocks,
causing Claude API to reject them as "Invalid image data".

- conversation-context.service: check mediaType for application/pdf,
  use type:'document' block (Anthropic native PDF support) instead
- claude-agent-sdk-engine: detect both 'image' and 'document' blocks
  when deciding to build multimodal SDK prompt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 05:27:41 -08:00
hailin 2c657e2b4c fix: use NestJS native useBodyParser instead of direct express import
The direct `import * as express from 'express'` caused a
MODULE_NOT_FOUND error in the Docker production image since express
is only available as a transitive dependency via @nestjs/platform-express.
Use NestExpressApplication.useBodyParser() which is the official NestJS API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 04:01:54 -08:00
hailin b9c3bfdf91 feat: add multimodal image support to Claude Agent SDK engine
- SDK engine now constructs AsyncIterable<SDKUserMessage> with image
  content blocks when attachments are present in conversationHistory,
  using the SDK's native multimodal prompt format
- CLI engine logs a warning when images are detected, since the `-p`
  flag only accepts text (upstream Claude CLI limitation)
- Both SDK and API engines now fully support multimodal image input

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 03:38:59 -08:00
hailin e4c2505048 feat: add multimodal image input with streaming markdown optimization
Two major features in this commit:

1. Streaming Markdown Rendering Optimization
   - Replace deprecated flutter_markdown with gpt_markdown (active, AI-optimized)
   - Real-time markdown rendering during streaming (was showing raw syntax)
   - Solid block cursor (█) instead of AnimationController blink
   - 80ms token throttle buffer reducing rebuilds from per-token to ~12.5/sec
   - RepaintBoundary isolation for markdown widget repaints
   - StreamTextWidget simplified from StatefulWidget to StatelessWidget

2. Multimodal Image Input (camera + gallery + display)
   - Flutter: image_picker for gallery/camera, base64 encoding, attachment
     preview strip with delete, thumbnails in sent messages
   - Data layer: List<String>? → List<Map<String, dynamic>>? for structured
     attachment payloads through datasource/repository/usecase
   - ChatAttachment model with base64Data, mediaType, fileName
   - ChatMessage entity + ChatMessageModel both support attachments field
   - Backend DTO, Entity (JSONB), Controller, ConversationContextService
     all extended to receive, store, and reconstruct Anthropic image
     content blocks in loadContext()
   - Claude API engine skips duplicate user message when history already
     ends with multimodal content blocks
   - NestJS body parser limit raised to 10MB for base64 image payloads
   - Android CAMERA permission added to manifest
   - Image.memory uses cacheWidth/cacheHeight for memory efficiency
   - Max 5 images per message enforced in UI

Data flow:
  ImagePicker → base64Encode → ChatAttachment → POST body →
  DB (JSONB) → loadContext → Anthropic image content blocks → Claude API

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 03:24:17 -08:00
hailin 50dbb641a3 fix: comprehensive hardening of agent task cancel/inject/approve flows
6 rounds of systematic audit identified and fixed 14 bugs across
backend controller and Flutter client:

## Backend (agent.controller.ts)

Security & Tenant Isolation:
- Add @TenantId + ForbiddenException check to cancelTask, injectMessage,
  approveCommand — all 4 write endpoints now enforce tenant isolation
- Add tenantId check on session reuse in executeTask to prevent
  cross-tenant session hijacking

Architecture & Correctness:
- Extract shared runTaskStream() from inline fire-and-forget block,
  used by both executeTask and injectMessage to reduce duplication
- Use session.engineType (not getActiveEngine()) in cancelTask,
  injectMessage, approveCommand — fixes wrong-engine-cancel when
  global engine config is switched after task creation
- Add concurrent task prevention: executeTask checks for existing
  RUNNING task on same session and cancels it before starting new one
- Add runningTasks Map to track task promises, awaitTaskCleanup()
  helper with 3s timeout for inject to wait for partial text save
- captureSdkSessionId() captures SDK session ID into metadata
  without DB save (callers persist), preventing fire-and-forget race

Cancel/Reject Improvements:
- cancelTask: idempotent (returns early if already CANCELLED/COMPLETED),
  session stays 'active' (was 'cancelled'), emits cancelled WS event
- approveCommand reject: session stays 'active' (was 'cancelled'),
  now emits cancelled WS event so Flutter stream listeners clean up
- approveCommand approved: collect text events and save assistant
  response to conversation history on completion (was missing)

Minor:
- task.result! non-null assertion → task.result ?? 'Unknown error'
- Add findRunningBySessionId() to TaskRepository

## Flutter

API Contract Fix:
- approveCommand: route changed from /api/v1/ops/approvals/:id/approve
  to /api/v1/agent/tasks/:id/approve with {approved: true} body
- rejectCommand: route changed from /api/v1/ops/approvals/:id/reject
  to /api/v1/agent/tasks/:id/approve with {approved: false} body

Resource Management:
- ChatNotifier.dispose() now disconnects WebSocket to prevent
  connection leak when navigating away from chat

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 22:20:46 -08:00
hailin d5f663f7af feat: inject-message support for mid-stream task interruption
Backend (agent-engine.port.ts):
- Add `cancelled` event type: emitted when a task is cancelled (user-initiated
  or injection), so Flutter can close the old stream cleanly
- Add `task_info` event type: emitted after inject to pass the new taskId to
  the client, enabling cancel/re-inject on the replacement task

Flutter (features/chat/):
- ChatState: track current `taskId` alongside `sessionId`; clear on completion
  or error
- Handle `TaskInfoEvent`: update taskId in state when server issues a new task
- Handle `CancelledEvent`: treat as stream termination (agentStatus → idle)
- MessageType.interrupted: new UI node (warning style) for mid-stream cancels
- _inject(): send text as an inject request while streaming; backend cancels
  the current task and starts a new one with the injected message
- Input area: during streaming, hint changes to "追加指令...", Enter key calls
  _inject() instead of _send(), and both inject-send + stop buttons are shown
- isAwaitingApproval kept separate from isStreaming so approval flow is not
  blocked by inject mode

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 21:33:50 -08:00
hailin ce4e7840ec fix: route AgentSkillService to per-tenant schema to match SDK engine
Previously AgentSkillService wrote skills to public.agent_skills (TypeORM
entity with tenantId column filter), while ClaudeAgentSdkEngine read from
it0_t_{tenantId}.skills (per-tenant schema). The two tables were never
connected, so any skill added via the CRUD API was invisible to the agent.

This fix:
- Rewrites AgentSkillService to use DataSource + raw SQL against the
  per-tenant schema it0_t_{tenantId}.skills
- Maps API fields: script→content, enabled→is_active
- Removes AgentSkillRepository and AgentSkill entity from module (no longer needed)
- CRUD API response shape is unchanged (fields mapped back to script/enabled)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 21:21:36 -08:00
hailin 3278696f4c feat: inject tenant skills into agent system prompt
Load active skills from the tenant's schema `skills` table and append
them to the system prompt before passing to the Claude Agent SDK. This
closes the gap where skills existed in the DB but were never surfaced
to the agent during task execution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 20:42:15 -08:00
hailin 36d36acad4 fix: set tenantId when creating credentials in inventory-service
The createCredential method was missing the tenantId assignment,
causing a NOT NULL constraint violation on the credentials table.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:52:14 -08:00
hailin 51b348e609 feat: complete tenant member management (CRUD + delete tenant)
Backend: add 5 missing endpoints to TenantController:
- DELETE /tenants/:id (deprovision schema + cleanup)
- GET /tenants/:id/members (query tenant schema users)
- PATCH /tenants/:id/members/:memberId (change role)
- DELETE /tenants/:id/members/:memberId (remove member)
- PUT /tenants/:id (alias for frontend compatibility)

Frontend: add member actions to tenant detail page:
- Role column changed to dropdown selector
- Added remove member button with confirmation
- Added updateMember and removeMember mutations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:00:09 -08:00
hailin bc7e32061a fix: improve voice call reconnection robustness
Server side (session_router.py):
- /reconnect now accepts sessions in "active" state (not just "disconnected")
- When client reconnects to an active session, the old WebSocket/pipeline is
  automatically replaced when the new WebSocket connects
- Only truly terminal states (e.g. "ended") return 409

Flutter side (agent_call_page.dart):
- Distinguish terminal errors (404 session gone, 409 ended) from transient
  errors (network timeout, server unreachable) in reconnect loop
- Terminal errors break immediately instead of wasting retry attempts
- Extract _connectWebSocket() helper for cleaner reconnect flow
- Add DioException handling for proper HTTP status code inspection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 07:33:34 -08:00
hailin 75083f23aa debug: add TTS send_bytes logging to pipeline
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 06:19:18 -08:00
hailin 5be7f9c078 fix: resample OpenAI TTS output from 24kHz to 16kHz WAV
OpenAI TTS returns 24kHz audio which Android MediaPlayer can't play
via FlutterSound's pcm16WAV codec. Request raw PCM and resample to
16kHz before wrapping in WAV header, matching the local TTS format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 05:38:39 -08:00
hailin 4456550393 feat: lazy-load local TTS/STT models on first request
Local /synthesize and /transcribe endpoints now auto-load Kokoro/Whisper
models on first call instead of returning 503 when not pre-loaded at
startup. This allows switching between Local and OpenAI providers in the
Flutter test page without requiring server restart.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 04:38:49 -08:00
hailin cc0f06e2be feat: SDK engine native resume with per-tenant HOME isolation
Replace prompt-prefix workaround with SDK's native resume mechanism.
Each tenant gets isolated HOME directory (/data/claude-tenants/{tenantId})
to prevent cross-tenant session file mixing. SDK session IDs are persisted
in session.metadata for cross-request resume support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 02:27:38 -08:00
hailin 2403ce5636 feat: multi-turn conversation context management with session history UI
Implement DB-based conversation message storage (engine-agnostic) that
works across both Claude API and Agent SDK engines. Add ChatGPT/Claude-style
conversation history drawer in Flutter with date-grouped session list,
session switching, and new chat functionality.

Backend: entity, repository, context service, migration 004, session/message
API endpoints. Flutter: ConversationDrawer, sessionId flow from backend
response via SessionInfoEvent, session list/switch/delete support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 19:04:35 -08:00
hailin c02c2a9a11 feat: add OpenAI TTS/STT provider support in voice pipeline
- Add STT_PROVIDER/TTS_PROVIDER config (local or openai) in settings
- Pipeline uses OpenAI API for STT/TTS when provider is "openai"
- Skip loading local models (Kokoro/faster-whisper) when using OpenAI
- VAD (Silero) always loads for speech detection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 09:27:38 -08:00
hailin f8f0d17820 fix: disable SSL verification for OpenAI proxy with self-signed cert
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 08:59:06 -08:00
hailin d43baed3a5 feat: add OpenAI TTS/STT API endpoints for comparison testing
- Add openai package to voice-service requirements
- Add /api/v1/test/tts/synthesize-openai (tts-1/tts-1-hd/gpt-4o-mini-tts)
- Add /api/v1/test/stt/transcribe-openai (gpt-4o-transcribe/whisper-1)
- Add OPENAI_API_KEY and OPENAI_BASE_URL env vars to voice-service
- Flutter test page: SegmentedButton to toggle Local/OpenAI provider
- All endpoints maintain same response format for easy comparison

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 07:20:03 -08:00
hailin 5d4fd96d43 feat: streaming claude-api engine, engineType override, fix voice test page
- Claude API engine now uses streaming API (messages.stream) for real-time
  text delta output instead of waiting for full response
- Agent controller accepts optional engineType body parameter to allow
  callers (e.g. voice pipeline) to select a specific engine
- Fix voice_test_page.dart compilation error: replace audioplayers (not
  installed) with flutter_sound (already in pubspec.yaml)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 05:30:11 -08:00
hailin 0bd050c80f feat: add STT test and round-trip test to voice test page
- STT: record from mic or upload audio file → faster-whisper transcription
- Round-trip: record → STT → TTS → playback (full pipeline test)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 05:08:00 -08:00
hailin 0aa20cbc73 feat: add temporary TTS test page at /api/v1/test/tts
Browser-accessible page to test text-to-speech synthesis without
going through the full voice pipeline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 05:06:02 -08:00
hailin 740f8f5f88 fix: sentence splitting bug in voice pipeline TTS streaming
When the first punctuation mark appeared before _MIN_SENTENCE_LEN chars,
the regex search would always find it first and skip it, permanently
blocking all subsequent sentence splits. Fix by advancing search_start
past short matches instead of breaking out of the loop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 05:03:05 -08:00
hailin 79fae0629e chore: upgrade claude-agent-sdk to ^0.2.52
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 04:12:03 -08:00
hailin 2a150dcff5 fix: prevent error event from overriding completed status in controller
Add finished guard so that once a task reaches completed/error terminal
state, subsequent events don't flip the status back.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 03:49:21 -08:00
hailin 8e4bd573f4 fix: deduplicate text events from SDK stream_event and assistant message
SDK sends text both via stream_event deltas (token-level) and assistant
message (complete block). Track hasStreamedText flag per session to skip
duplicate text extraction from assistant messages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 03:31:48 -08:00
hailin 65e68a0487 feat: streaming TTS — synthesize per-sentence as agent tokens arrive
Replace batch TTS (wait for full response) with streaming approach:
- _agent_generate → _agent_stream async generator (yield text chunks)
- _process_speech accumulates tokens, splits on sentence boundaries
- Each sentence is TTS'd and sent immediately while more tokens arrive
- First audio plays within ~1s of agent response vs waiting for full text

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 03:14:22 -08:00
hailin aa2a49afd4 fix: extract text from assistant message + fix event data parsing
Root causes found:
1. SDK engine only emitted 'completed' without 'text' events because
   mapSdkMessage skipped text blocks in 'assistant' messages (assumed
   stream_event deltas would provide them, but SDK didn't send deltas)
2. Voice pipeline read evt_data.data.content but engine events are flat
   (evt_data.content) — so even if text arrived, it was never extracted

Fixes:
- Extract text/thinking blocks from assistant messages in SDK engine
- Fix voice pipeline to read content directly from evt_data, not nested

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 03:01:25 -08:00
hailin a7b42e6b98 feat: add detailed logging to agent engine and task controller
Log every SDK message type, event emission, and stream lifecycle
to diagnose why text events are missing in voice-agent flow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 02:56:09 -08:00
hailin 0dbe711ed3 feat: add detailed logging to voice pipeline (STT/Agent/TTS timing)
Log timestamps, content, and event details at each pipeline stage
to help diagnose voice-agent integration issues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 02:47:21 -08:00
hailin 1d5c834dfe feat: add event buffering to agent WS gateway for late subscribers
Buffer stream events when no WS clients are subscribed yet, then replay
them when a client subscribes. This eliminates the race condition where
events are lost between task creation and WS subscription.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 02:41:38 -08:00
hailin 370e32599f fix: subscribe to agent WS before creating task to avoid race condition
The engine stream could emit text events before the voice pipeline
subscribed, causing all text to be lost.  Now we connect and subscribe
first, then POST the task.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 02:35:57 -08:00
hailin abf5e29419 feat: route voice pipeline through agent-service instead of direct LLM
Voice calls now use the same agent task + WS subscription flow as the
chat UI, enabling tool use and command execution during voice sessions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 00:47:31 -08:00
hailin 7afbd54fce fix: rewrite voice pipeline for direct WebSocket I/O, fix TTS and navigation
Root cause: Pipecat's WebsocketServerTransport creates its own WebSocket
server on (host,port) and expects FrameProcessor subclasses. Our code was
passing a FastAPI WebSocket object as 'host' and using plain STT/TTS/VAD
service classes that aren't FrameProcessors. The pipeline crashed immediately
when receiving audio, causing "disconnects when speaking".

Changes:
- **base_pipeline.py**: Complete rewrite — replaced Pipecat Pipeline with
  direct async loop: WebSocket → VAD → STT → Claude LLM → TTS → WebSocket.
  Supports barge-in (interrupt TTS when user speaks), audio chunking, and
  24kHz→16kHz TTS resampling.
- **session_router.py**: Pass WebSocket directly to pipeline instead of
  wrapping in AppTransport.
- **app_transport.py**: Deprecated (no longer needed).
- **kokoro_service.py**: Fix misaki compatibility (MutableToken→MToken
  rename), use correct Chinese voice 'zf_xiaoxiao', handle torch tensors.
- **main.py**: Apply misaki monkey-patch before importing kokoro.
- **settings.py**: Change default TTS voice from 'zh_female_1' (non-existent)
  to 'zf_xiaoxiao' (valid Kokoro-82M Chinese female voice).
- **requirements.txt**: Remove pipecat-ai dependency, pin kokoro==0.3.5 +
  misaki==0.7.17, add Chinese NLP deps (pypinyin, cn2an, jieba, ordered-set).
- **agent_call_page.dart**: Wrap each cleanup step in try/catch to ensure
  Navigator.pop() always executes after call ends. Add 3s timeout on session
  delete request.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 23:34:35 -08:00
hailin 6cd53e713c fix: bypass JWT for voice WebSocket route (fixes 401 on WS upgrade)
根因:Kong 日志显示 voice WebSocket 连接被 JWT 插件返回 401,
因为 WebSocket RFC 6455 不支持自定义 header,Flutter 的
WebSocketChannel.connect 无法携带 Authorization header。

修复策略(业界标准做法):
1. Kong: 将 voice-service 的 JWT 从 service 级别改为 route
   级别,仅在 voice-api 和 twilio-webhook 路由启用 JWT,
   voice-ws 路由免除(session 创建已通过 JWT 验证,
   session_id 本身作为认证凭据)
2. 后端: session_router 返回的 websocket_url 改为
   /ws/voice/{session_id}(匹配 Kong voice-ws 路由路径)
3. FastAPI: 在 app 级别增加 /ws/voice/{session_id} 顶级
   WebSocket 路由,委托给 session_router 的 handler

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 21:30:11 -08:00
hailin 74be945e4a feat: enable token-level streaming and fix duplicate message bubble
Backend:
- Add includePartialMessages: true to SDK query options
- Handle stream_event/content_block_delta for real-time text streaming
- Skip text/thinking blocks from complete assistant messages (already
  streamed via deltas) to avoid duplication
- Change default result summary to empty string

Flutter:
- Only show CompletedEvent summary when no assistant text was streamed
  (prevents duplicate message bubble)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 17:24:48 -08:00
hailin 86d7cac631 fix: replace Socket.IO with raw WebSocket to fix 502 on /ws/agent
Socket.IO requires its own handshake protocol (EIO=4) which Kong cannot
proxy as a plain WebSocket upgrade, causing 502 Bad Gateway. Switch to
@nestjs/platform-ws (WsAdapter) with manual session room tracking so
Flutter's IOWebSocketChannel can connect directly.

Also add ws/wss protocols to Kong WebSocket routes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 16:52:43 -08:00
hailin 9cdc4933dc fix: add python-multipart dependency for voice-service
Required by FastAPI for form/file upload parsing. Missing dependency
may cause import errors and container restart loops.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 16:10:50 -08:00
hailin 3cb9ebd407 fix: release QueryRunner connections to prevent pool exhaustion
TenantAwareRepository.getRepository() was calling createQueryRunner()
without ever releasing it, causing database connection pool exhaustion.
This caused ops-service (and eventually other services) to hang on
all API requests once the pool filled up.

Replaced getRepository() with withRepository() pattern that wraps
operations in try/finally to always release the QueryRunner.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 15:55:06 -08:00
hailin a6cd3c20d9 feat: add WebSocket robustness to voice call (heartbeat, reconnect, jitter buffer)
Addresses reliability gaps in the real-time voice WebSocket connection
between Flutter client and Python voice-service backend.

Backend (voice-service):
- Heartbeat: new _heartbeat_sender coroutine sends JSON ping text frames
  every 15s alongside the Pipecat pipeline; failed send = dead connection
- Session preservation: on WebSocket disconnect, sessions are now marked
  "disconnected" with a timestamp instead of being deleted, allowing
  reconnection within a configurable TTL (default 60s)
- Reconnect endpoint: POST /sessions/{id}/reconnect verifies the session
  is alive and in "disconnected" state, returns fresh websocket_url
- Reconnect-aware WS handler: detects "disconnected" sessions, cancels
  stale pipeline tasks, creates a new pipeline, sends "session.resumed"
- Background cleanup: asyncio loop every 30s removes sessions that have
  been disconnected longer than session_ttl
- Structured event protocol: text frames = JSON control messages
  (ping/pong/session.resumed/session.ended/error), binary = PCM audio
- New settings: session_ttl (60s), heartbeat_interval (15s),
  heartbeat_timeout (45s)

Flutter (agent_call_page.dart):
- Heartbeat monitoring: tracks last server ping timestamp, triggers
  reconnect if no ping received in 45s (3 missed intervals)
- Auto-reconnect: exponential backoff (1s→2s→4s→8s→16s), max 5 attempts;
  calls /reconnect endpoint to verify session, rebuilds WebSocket,
  resets audio buffer, restarts heartbeat
- Reconnecting UI: yellow warning banner "重新连接中... (N/5)" with
  spinner overlay during reconnection attempts
- WebSocket data routing: _onWsData distinguishes String (JSON control)
  from binary (audio) frames, handles ping/session.resumed/session.ended
- User-initiated disconnect guard: _userEndedCall flag prevents reconnect
  attempts when user intentionally hangs up
- session_id field compatibility: supports session_id/sessionId/id

Flutter (pcm_player.dart):
- Jitter buffer: queues incoming PCM chunks, starts playback only after
  accumulating 4800 bytes (150ms at 16kHz 16-bit mono) to smooth out
  network timing variance
- reset() method: clears buffer on reconnect to discard stale audio
- Buffer underrun handling: re-enters buffering phase if queue empties

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 07:32:19 -08:00
hailin d4391eef97 fix: run services as non-root user for SDK bypassPermissions
SDK blocks bypassPermissions when running as root for security.
Add non-root 'appuser' to Dockerfile.service and update volume
mounts to use /home/appuser/.claude paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 06:41:10 -08:00
hailin 04a18a7899 fix: use acceptEdits mode and mount .claude.json for SDK
- bypassPermissions blocked by SDK when running as root
- Switch to acceptEdits with canUseTool for programmatic control
- Mount .claude.json config file into container

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 06:37:31 -08:00
hailin db1d0620f2 debug: add stderr callback to SDK engine for error visibility
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 06:34:42 -08:00