Implements a two-level abort controller design to support real-time
interruption when the user speaks while the agent is still responding:
sessionAbortController (session-scoped)
- Created once when startSession() is called
- Fired only by terminateSession() (user hangs up)
- Propagated into each turn via addEventListener
turnAbort (per-turn, stored as handle.currentTurnAbort)
- Created fresh at the start of each executeTurn() call
- Stored on the VoiceSessionHandle so injectMessage() can abort it
- When a new inject arrives while a turn is running, injectMessage()
calls turnAbort.abort() BEFORE enqueuing the new message
Interruption flow:
1. User speaks mid-response → LiveKit stops TTS playback (client-side)
2. STT utterance → POST voice/inject → injectMessage() fires
3. handle.currentTurnAbort.abort() called → sets aborted flag
4. for-await loop checks turnAbort.signal.aborted on next SDK event → break
5. catch block NOT reached (break ≠ exception) → no error event emitted
6. finally block saves partial text with "[中断]" suffix to history
7. New message dequeued → fresh executeTurn() starts immediately
Why no "Agent error" message plays to the user:
- break exits the for-await loop silently, not via exception
- The catch block's error-event emission is guarded by err?.name !== 'AbortError'
AND requires an actual exception; a plain break never enters catch
- Empty or partial responses are filtered by `if response:` in agent.py
Also update module-level JSDoc with full architecture explanation covering
the long-lived run loop design, two-level abort hierarchy, tenant context
injection pattern, and SDK session resume across turns.
Update agent.py module docstring to document voice session lifecycle and
interruption flow for future maintainers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the per-turn POST /tasks approach for voice calls with a
long-lived agent run loop tied to the call lifecycle:
agent-service:
- Add AsyncQueue<T> utility for blocking message relay
- Add VoiceSessionManager: spawns one background run loop per voice call,
accepts injected messages, terminates cleanly on hangup
- Add VoiceSessionController with 3 endpoints:
POST /api/v1/agent/sessions/voice/start (call start)
POST /api/v1/agent/sessions/:id/voice/inject (each speech turn)
DELETE /api/v1/agent/sessions/:id/voice (user hung up)
- Register VoiceSessionManager + VoiceSessionController in agent.module.ts
voice-agent:
- AgentServiceLLM: add start_voice_session(), terminate_voice_session(),
inject_text_message() (voice/inject-aware), _do_inject_voice()
- AgentServiceLLMStream._run(): use voice/inject path when voice session
is active; fall back to per-task POST for text-chat / non-SDK engines
- entrypoint(): call start_voice_session() after session.start();
register _on_room_disconnect that calls terminate_voice_session()
so the agent is always killed when the user hangs up
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two issues fixed:
1. agent.controller.ts — on the FIRST task of each session, write title+voiceMode
into session.metadata so the client can display a meaningful conversation title:
- Text sessions: metadata.title = first 40 chars of user prompt
- Voice sessions: metadata.title = '' + metadata.voiceMode = true
(Flutter renders these as '语音对话 M/D HH:mm')
titleSet flag prevents overwriting the title on subsequent turns of the same session.
2. session.controller.ts — listSessions() now returns a DTO instead of the raw entity.
systemPrompt is an internal engine instruction and is explicitly excluded from the
response. The client receives { id, status, engineType, metadata, createdAt, updatedAt }.
Voice sessions set systemPrompt to the voice-mode instruction string,
causing every voice conversation to display '你正在通过语音与用户实时对话。请…'
as its title in the chat history list.
Title derivation priority (highest to lowest):
1. metadata.title — explicit title saved by backend on first task
2. metadata.voiceMode == true → '语音对话 M/D HH:mm'
3. Fallback → '对话 M/D HH:mm' based on session createdAt
The billing-service tsconfig.json was missing the TypeScript path aliases
required for the workspace build (turbo builds shared packages first, then
resolves @it0/* via paths). Without these, nest build fails with
'Cannot find module @it0/database'.
Also disables overly strict checks (strictNullChecks, strictPropertyInitialization,
useUnknownInCatchVariables) to match the lenient settings used by other services.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comprehensive fix of 124 TS errors across the billing-service:
Entity fixes:
- invoice.entity.ts: add InvoiceStatus/InvoiceCurrency const objects,
rename fields to match DB schema (subtotalCents, taxCents, totalCents,
amountDueCents), add OneToMany items relation
- invoice-item.entity.ts: add InvoiceItemType const object, add column
name mappings and currency field
- payment.entity.ts: add PaymentStatus const, rename amount→amountCents
with column name mapping, add paidAt field
- subscription.entity.ts: add SubscriptionStatus const object
- usage-aggregate.entity.ts: rename periodYear/Month→year/month to match
DB columns, add periodStart/periodEnd fields
- payment-method.entity.ts: add displayName, expiresAt, updatedAt fields
Port/Provider fixes:
- payment-provider.port.ts: make PaymentProviderType a const object (not
just a type), add PaymentSessionRequest alias, rename WebhookEvent with
correct field shape (type vs eventType), make providerPaymentId optional
- All 4 providers: replace PaymentSessionRequest→CreatePaymentParams,
fix amountCents→amount, remove sessionId from PaymentSession return,
add confirmPayment() stub, fix Stripe API version to '2023-10-16'
Use case fixes:
- aggregate-usage.use-case.ts: replace 'redis' with 'ioredis' (workspace
standard); rewrite using ioredis xreadgroup API
- change/check/generate use cases: fix Plan field names
(monthlyPriceCentsUsd, includedTokens, overageRateCentsPerMTokenUsd)
- generate-monthly-invoice: fix SubscriptionStatus/InvoiceCurrency as
values (now const objects)
- handle-payment-webhook: fix WebhookResult import, result.type usage,
payment.paidAt
Controller/Repository fixes:
- plan.controller.ts, plan.repository.ts: fix Plan field names
- webhook.controller.ts: remove express import, use any for req type
- invoice-generator.service.ts: fix overageAmountCents→overageCentsUsd,
monthlyPriceCny→monthlyPriceFenCny, includedTokensPerMonth→includedTokens
Dependencies:
- billing-service/package.json: replace redis with ioredis dependency
- pnpm-lock.yaml: regenerated after ioredis addition
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Dockerfile.service: add COPY lines for billing-service/package.json in
both build and production stages so pnpm install includes its deps
(omission caused 'node_modules missing' turbo build error)
- pnpm-lock.yaml: regenerated after running pnpm install to include all
billing-service dependencies (stripe, alipay-sdk, wechat-pay-v3, etc.)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- 005-create-billing-tables.sql: replace all `it0_shared.tenants` with
`public.tenants` and all `tenant_id VARCHAR(20)` with `tenant_id UUID`
to match the actual server DB schema (public schema, UUID primary key)
- packages/shared/testing src/test-utils.ts: add new quota fields
(maxServers, maxUsers, maxStandingOrders, maxAgentTokensPerMonth) to
TEST_TENANT mock to satisfy the extended TenantInfo interface, fixing
the @it0/testing TypeScript build error
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ensure /data/versions/android and /data/versions/ios directories are
created with correct appuser ownership during image build, fixing
EACCES permission error when version-service starts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The entrypoint.sh expects dist/services/${SERVICE_NAME}/src/main, but
nest build with inline TypeORM config produces dist/main directly.
Using DatabaseModule from @it0/database forces tsc to emit the nested
path structure (since it references shared packages), matching the
entrypoint path convention used by all other services.
Also gains SnakeNamingStrategy and autoLoadEntities from the shared module.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
voice-agent agent.py:
- Module docstring explains lk.agent.state lifecycle
(initializing → listening → thinking → speaking)
- Explains how RoomIO publishes state as participant attribute
- Documents BackgroundAudioPlayer with all available built-in clips
Flutter agent_call_page.dart:
- Documents _agentState field and all possible values
- Documents ParticipantAttributesChanged listener with UI mapping
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Import from livekit.agents.voice.background_audio submodule directly,
as it's not re-exported from livekit.agents.voice.__init__.py.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- voice-agent: enable BackgroundAudioPlayer with keyboard typing sound
during LLM thinking state (auto-plays when agent enters "thinking",
stops when "speaking" starts)
- Flutter: monitor lk.agent.state participant attribute from LiveKit
agent, show pulsing dots animation + "思考中..." text when thinking,
avatar border changes to warning color with pulsing glow ring
- Both call mode and chat mode headers show thinking state
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Detailed record of why livekit-plugins-speechmatics was removed:
- EXTERNAL: no FINAL_TRANSCRIPT (framework never sends FlushSentinel)
- ADAPTIVE: zero output (dual Silero VAD conflict)
- SMART_TURN: fragments Chinese speech into tiny pieces
- FIXED: finalize() async race condition with session teardown
All tested on 2026-03-03, none viable with LiveKit agents v1.4.4.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SMART_TURN fragments continuous speech into tiny pieces, each triggering
an LLM request that aborts the previous one. FIXED mode waits for a
configurable silence duration (1.0s) before emitting FINAL_TRANSCRIPT
via the built-in END_OF_UTTERANCE handler.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document all findings from the integration process directly in the
source code for future reference:
1. Language code mapping: Speechmatics uses ISO 639-3 "cmn" for
Mandarin, but LiveKit LanguageCode auto-normalizes it to "zh".
Must override stt._stt_options.language after construction.
2. Turn detection modes (critical):
- EXTERNAL: unusable — LiveKit never sends FlushSentinel, only
pushes silence frames, so FINAL_TRANSCRIPT never arrives
- ADAPTIVE: unusable — client-side Silero VAD conflicts with
LiveKit's own VAD, produces zero transcription output
- SMART_TURN: correct choice — server-side intelligent turn
detection, auto-emits FINAL_TRANSCRIPT, fully compatible
3. Speaker diarization: is_active flag distinguishes primary speaker
from TTS echo, solving the "speaker confusion" problem
4. Docker deployment: SPEECHMATICS_API_KEY in .env, watch for
COPY layer cache when rebuilding
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace EXTERNAL mode + monkey-patch hack with SMART_TURN mode.
SMART_TURN uses Speechmatics server-side turn detection that properly
emits AddSegment (FINAL_TRANSCRIPT) when the user finishes speaking.
No client-side finalize or debounce timer needed.
Ref: https://docs.speechmatics.com/integrations-and-sdks/livekit
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Speechmatics re-sends identical partial segments during silence, causing
the debounce timer to fire multiple times with the same text. Each
duplicate FINAL aborts the in-flight LLM request and restarts it.
Replace time-based cooldown with text comparison: skip finalization if
the segment text matches the last finalized text. Also skip starting
new timers when partial text hasn't changed from last finalized.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Reduce debounce delay from 700ms to 400ms for faster response
- Add 1.5s cooldown after emitting FINAL to prevent duplicate triggers
that cause LLM abort/retry cycles
- Enable speaker diarization (enable_diarization=True)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The LiveKit framework never sends FlushSentinel to the STT stream.
Instead it pushes silence frames and waits for FINAL_TRANSCRIPT events.
In EXTERNAL turn-detection mode, Speechmatics only emits partials.
New approach: each partial transcript restarts a 700ms debounce timer.
When partials stop (user stops speaking), the timer fires and promotes
the last partial to FINAL_TRANSCRIPT, unblocking the pipeline.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Trace _patched_process_audio lifecycle and FlushSentinel handling
to diagnose why final transcripts are not being promoted.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
VoiceAgentClient.finalize() schedules an async task chain that often
loses the race against session teardown. Instead, intercept partial
segments as they arrive, stash them, and synchronously emit them as
FINAL_TRANSCRIPT when FlushSentinel fires.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move the SpeechStream._process_audio patch from container runtime
into our own source code so it survives Docker rebuilds. The patch
adds client.finalize() on FlushSentinel so EXTERNAL mode produces
final transcripts when LiveKit's VAD detects end of speech.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EXTERNAL mode produces partial transcripts but livekit-plugins-speechmatics
does not call finalize() when receiving a flush sentinel from the framework.
A runtime monkey-patch on the plugin's SpeechStream._process_audio adds the
missing finalize() call so final transcripts are generated.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Speechmatics handles end-of-utterance natively via its Voice Agent
API (ADAPTIVE mode). Use turn_detection="stt" on AgentSession so
LiveKit delegates turn boundaries to the STT engine instead of
conflicting with its own VAD-based turn detection.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ADAPTIVE mode enables a second client-side Silero VAD inside the
Speechmatics SDK that conflicts with LiveKit's own VAD pipeline,
causing no transcription to be returned. EXTERNAL mode delegates
turn detection to LiveKit.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
LiveKit's LanguageCode class normalizes ISO 639-3 codes to ISO 639-1
(cmn → zh), but Speechmatics API expects "cmn" not "zh". Override
the internal _stt_options.language after construction.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend APIs return arrays directly, not { data, total } wrappers.
Changed 21 interface declarations to type aliases matching actual
API response format.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All pages expected API responses in { data: [], total } format but
backend APIs return plain arrays. Changed data?.data ?? [] to data ?? []
across 22 page components.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When API returns 401, clear stored tokens and redirect to /login
instead of showing an error message.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The web-admin frontend was calling incorrect API paths that didn't match
the actual backend service routes through Kong gateway, causing all
requests to fail with 404 or route-mismatch errors.
URL corrections:
- servers: /api/v1/servers → /api/v1/inventory/servers
- runbooks: /api/v1/runbooks → /api/v1/ops/runbooks
- risk-rules: /api/v1/security/risk-rules → /api/v1/agent/risk-rules
- credentials: /api/v1/security/credentials → /api/v1/inventory/credentials
- roles: /api/v1/security/roles → /api/v1/auth/roles
- permissions: /api/v1/security/permissions → /api/v1/auth/permissions
- tenants: /api/v1/tenants → /api/v1/admin/tenants
- communication: /api/v1/communication → /api/v1/comm
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Problem:
- Text input area caused BOTTOM OVERFLOWED BY 135 PIXELS when keyboard opened
- Input bar overlapped with call control buttons
- Sent messages were not displayed on screen (only SnackBar feedback)
Solution — split into two distinct layouts:
1. Call Mode (default):
- Full-screen call UI: avatar, waveform, duration, large control buttons
- Keyboard button in controls toggles to chat mode
- No text input elements — clean voice-only interface
2. Chat Mode (tap keyboard button):
- Compact call header: green status dot + "iAgent" + duration + inline
mute/end/speaker/collapse controls
- Scrollable message list (Expanded widget — properly handles keyboard)
- User messages: right-aligned blue bubbles with attachment thumbnails
- Agent responses: left-aligned gray bubbles with robot avatar
- Input bar at bottom: attachment picker + text field + send button
Message display:
- User-sent text/attachments tracked in _messages list, shown as bubbles
- Agent responses sent back via LiveKit data channel (topic='text_reply')
from voice-agent → Flutter, displayed as assistant bubbles
- Auto-scroll to latest message
Voice-agent change (agent.py):
- After session.say(response), publish response text back to Flutter via
ctx.room.local_participant.publish_data() with topic='text_reply'
- Flutter listens for DataReceivedEvent to display agent responses
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>