- Add STT_PROVIDER/TTS_PROVIDER config (local or openai) in settings
- Pipeline uses OpenAI API for STT/TTS when provider is "openai"
- Skip loading local models (Kokoro/faster-whisper) when using OpenAI
- VAD (Silero) always loads for speech detection
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add openai package to voice-service requirements
- Add /api/v1/test/tts/synthesize-openai (tts-1/tts-1-hd/gpt-4o-mini-tts)
- Add /api/v1/test/stt/transcribe-openai (gpt-4o-transcribe/whisper-1)
- Add OPENAI_API_KEY and OPENAI_BASE_URL env vars to voice-service
- Flutter test page: SegmentedButton to toggle Local/OpenAI provider
- All endpoints maintain same response format for easy comparison
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Claude API engine now uses streaming API (messages.stream) for real-time
text delta output instead of waiting for full response
- Agent controller accepts optional engineType body parameter to allow
callers (e.g. voice pipeline) to select a specific engine
- Fix voice_test_page.dart compilation error: replace audioplayers (not
installed) with flutter_sound (already in pubspec.yaml)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- STT: record from mic or upload audio file → faster-whisper transcription
- Round-trip: record → STT → TTS → playback (full pipeline test)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Browser-accessible page to test text-to-speech synthesis without
going through the full voice pipeline.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When the first punctuation mark appeared before _MIN_SENTENCE_LEN chars,
the regex search would always find it first and skip it, permanently
blocking all subsequent sentence splits. Fix by advancing search_start
past short matches instead of breaking out of the loop.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add finished guard so that once a task reaches completed/error terminal
state, subsequent events don't flip the status back.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SDK sends text both via stream_event deltas (token-level) and assistant
message (complete block). Track hasStreamedText flag per session to skip
duplicate text extraction from assistant messages.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace batch TTS (wait for full response) with streaming approach:
- _agent_generate → _agent_stream async generator (yield text chunks)
- _process_speech accumulates tokens, splits on sentence boundaries
- Each sentence is TTS'd and sent immediately while more tokens arrive
- First audio plays within ~1s of agent response vs waiting for full text
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root causes found:
1. SDK engine only emitted 'completed' without 'text' events because
mapSdkMessage skipped text blocks in 'assistant' messages (assumed
stream_event deltas would provide them, but SDK didn't send deltas)
2. Voice pipeline read evt_data.data.content but engine events are flat
(evt_data.content) — so even if text arrived, it was never extracted
Fixes:
- Extract text/thinking blocks from assistant messages in SDK engine
- Fix voice pipeline to read content directly from evt_data, not nested
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Log every SDK message type, event emission, and stream lifecycle
to diagnose why text events are missing in voice-agent flow.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Log timestamps, content, and event details at each pipeline stage
to help diagnose voice-agent integration issues.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Buffer stream events when no WS clients are subscribed yet, then replay
them when a client subscribes. This eliminates the race condition where
events are lost between task creation and WS subscription.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The engine stream could emit text events before the voice pipeline
subscribed, causing all text to be lost. Now we connect and subscribe
first, then POST the task.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Voice calls now use the same agent task + WS subscription flow as the
chat UI, enabling tool use and command execution during voice sessions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause: Pipecat's WebsocketServerTransport creates its own WebSocket
server on (host,port) and expects FrameProcessor subclasses. Our code was
passing a FastAPI WebSocket object as 'host' and using plain STT/TTS/VAD
service classes that aren't FrameProcessors. The pipeline crashed immediately
when receiving audio, causing "disconnects when speaking".
Changes:
- **base_pipeline.py**: Complete rewrite — replaced Pipecat Pipeline with
direct async loop: WebSocket → VAD → STT → Claude LLM → TTS → WebSocket.
Supports barge-in (interrupt TTS when user speaks), audio chunking, and
24kHz→16kHz TTS resampling.
- **session_router.py**: Pass WebSocket directly to pipeline instead of
wrapping in AppTransport.
- **app_transport.py**: Deprecated (no longer needed).
- **kokoro_service.py**: Fix misaki compatibility (MutableToken→MToken
rename), use correct Chinese voice 'zf_xiaoxiao', handle torch tensors.
- **main.py**: Apply misaki monkey-patch before importing kokoro.
- **settings.py**: Change default TTS voice from 'zh_female_1' (non-existent)
to 'zf_xiaoxiao' (valid Kokoro-82M Chinese female voice).
- **requirements.txt**: Remove pipecat-ai dependency, pin kokoro==0.3.5 +
misaki==0.7.17, add Chinese NLP deps (pypinyin, cn2an, jieba, ordered-set).
- **agent_call_page.dart**: Wrap each cleanup step in try/catch to ensure
Navigator.pop() always executes after call ends. Add 3s timeout on session
delete request.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend:
- Add includePartialMessages: true to SDK query options
- Handle stream_event/content_block_delta for real-time text streaming
- Skip text/thinking blocks from complete assistant messages (already
streamed via deltas) to avoid duplication
- Change default result summary to empty string
Flutter:
- Only show CompletedEvent summary when no assistant text was streamed
(prevents duplicate message bubble)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Socket.IO requires its own handshake protocol (EIO=4) which Kong cannot
proxy as a plain WebSocket upgrade, causing 502 Bad Gateway. Switch to
@nestjs/platform-ws (WsAdapter) with manual session room tracking so
Flutter's IOWebSocketChannel can connect directly.
Also add ws/wss protocols to Kong WebSocket routes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Required by FastAPI for form/file upload parsing. Missing dependency
may cause import errors and container restart loops.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TenantAwareRepository.getRepository() was calling createQueryRunner()
without ever releasing it, causing database connection pool exhaustion.
This caused ops-service (and eventually other services) to hang on
all API requests once the pool filled up.
Replaced getRepository() with withRepository() pattern that wraps
operations in try/finally to always release the QueryRunner.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses reliability gaps in the real-time voice WebSocket connection
between Flutter client and Python voice-service backend.
Backend (voice-service):
- Heartbeat: new _heartbeat_sender coroutine sends JSON ping text frames
every 15s alongside the Pipecat pipeline; failed send = dead connection
- Session preservation: on WebSocket disconnect, sessions are now marked
"disconnected" with a timestamp instead of being deleted, allowing
reconnection within a configurable TTL (default 60s)
- Reconnect endpoint: POST /sessions/{id}/reconnect verifies the session
is alive and in "disconnected" state, returns fresh websocket_url
- Reconnect-aware WS handler: detects "disconnected" sessions, cancels
stale pipeline tasks, creates a new pipeline, sends "session.resumed"
- Background cleanup: asyncio loop every 30s removes sessions that have
been disconnected longer than session_ttl
- Structured event protocol: text frames = JSON control messages
(ping/pong/session.resumed/session.ended/error), binary = PCM audio
- New settings: session_ttl (60s), heartbeat_interval (15s),
heartbeat_timeout (45s)
Flutter (agent_call_page.dart):
- Heartbeat monitoring: tracks last server ping timestamp, triggers
reconnect if no ping received in 45s (3 missed intervals)
- Auto-reconnect: exponential backoff (1s→2s→4s→8s→16s), max 5 attempts;
calls /reconnect endpoint to verify session, rebuilds WebSocket,
resets audio buffer, restarts heartbeat
- Reconnecting UI: yellow warning banner "重新连接中... (N/5)" with
spinner overlay during reconnection attempts
- WebSocket data routing: _onWsData distinguishes String (JSON control)
from binary (audio) frames, handles ping/session.resumed/session.ended
- User-initiated disconnect guard: _userEndedCall flag prevents reconnect
attempts when user intentionally hangs up
- session_id field compatibility: supports session_id/sessionId/id
Flutter (pcm_player.dart):
- Jitter buffer: queues incoming PCM chunks, starts playback only after
accumulating 4800 bytes (150ms at 16kHz 16-bit mono) to smooth out
network timing variance
- reset() method: clears buffer on reconnect to discard stale audio
- Buffer underrun handling: re-enters buffering phase if queue empties
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SDK blocks bypassPermissions when running as root for security.
Add non-root 'appuser' to Dockerfile.service and update volume
mounts to use /home/appuser/.claude paths.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- bypassPermissions blocked by SDK when running as root
- Switch to acceptEdits with canUseTool for programmatic control
- Mount .claude.json config file into container
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
In a Docker container without TTY, permissionMode 'default' blocks
waiting for interactive permission prompts. Switch to bypassPermissions
with canUseTool callback for programmatic risk-based access control.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tsc with module=commonjs converts `await import()` to require(),
which breaks ESM-only packages. Use Function('return import()')
workaround to preserve native dynamic import at runtime.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Mount ~/.claude/ into agent-service container for OAuth token access
- Switch default engine to claude_agent_sdk
- Remove ANTHROPIC_API_KEY from env in subscription mode so SDK uses OAuth
- Keep API key mode for per-tenant billing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Follow iConsulting pattern: set NODE_TLS_REJECT_UNAUTHORIZED=0 when
ANTHROPIC_BASE_URL is configured, enabling connection through the
self-signed proxy at 67.223.119.33.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change AGENT_ENGINE_TYPE from claude_code_cli to claude_api in docker-compose
- Add ANTHROPIC_BASE_URL env var support to claude-api-engine
- Add ANTHROPIC_BASE_URL to agent-service environment in docker-compose
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add restart: unless-stopped to all 12 Docker services
- Add process.on(unhandledRejection/uncaughtException) to all 7 service main.ts
- Fix handleEventTrigger using tenantId UUID as schema name instead of slug lookup
- Wrap Redis event subscription callbacks in try/catch
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace traditional on-device speech_to_text with a modern pipeline:
- Record audio via `record` package with hardware noise suppression
- Apply GTCRN neural denoising (sherpa-onnx, ICASSP 2024, 48K params)
- Trim silence, POST to backend /voice/transcribe (faster-whisper)
Changes:
- Add /transcribe endpoint to voice-service for audio file upload
- Add SpeechEnhancer wrapper for sherpa-onnx GTCRN model (523KB)
- Rewrite chat_page.dart voice input: record → denoise → transcribe
- Keep NoiseReducer.trimSilence for silence removal only
- Upgrade record to v6.2.0, add sherpa_onnx, path_provider
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend:
- Enhanced register endpoint to accept companyName for self-service
tenant creation with schema provisioning and admin user setup
- Added TenantInvite entity with token-based invitation system
- Added invite CRUD endpoints to TenantController (create/list/revoke)
- Added public endpoints for invite validation and acceptance
Frontend:
- Created registration page with optional organization name field
- Created invitation acceptance page at /invite/[token]
- Added invite management UI to tenant detail page
- Updated login page with link to registration
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement remaining backend controllers for all web admin menu pages:
- SettingsController: general, notification, theme, account, API keys
- RoleController: CRUD roles with permission assignment
- PermissionController: permission matrix for RBAC management
- MetricsController: server metrics overview and per-server data
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Map flat quota fields to nested quota object and add userCount field
to match the frontend's expected Tenant interface.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add UsersController to auth-service for user CRUD (GET/POST/PUT/DELETE /api/v1/auth/users)
- Add Kong route /api/v1/admin -> auth-service for tenant management
- Remove AuthGuard from TenantController (Kong handles JWT)
- Fix frontend agent-config API paths from /api/v1/agent/config to /api/v1/agent-config
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Kong handles JWT validation at the gateway level. Service-level
AuthGuard('jwt') fails because services don't register a Passport
JWT strategy (only auth-service does). Removed from 17 controllers
across ops, inventory, monitor, comm, audit, and agent services.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace global JWT plugin with per-service JWT (skip auth-service)
to fix auth routes being blocked by global JWT in DB-less mode
- Fix UserRepository and ApiKeyRepository to use standard TypeORM
instead of TenantAwareRepository (users are global, not per-schema)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add kid claim to auth-service JWT for Kong validation
- Add Kong consumer with JWT credential (shared secret via env)
- Add agent-config route to Kong for /api/v1/agent-config
- Kong Dockerfile uses entrypoint script to inject JWT_SECRET at runtime
- Fix frontend login path (/auth/login → /api/v1/auth/login)
- Extract tenantId from JWT on login and store as current_tenant
- Add auth guard in admin layout (redirect to /login if no token)
- Pass JWT_SECRET env var to Kong container in docker-compose
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace TenantAwareRepository with standard @InjectRepository
(TenantAwareRepository requires AsyncLocalStorage tenant context
middleware which agent-service does not have)
- Replace @TenantId() decorator with @Headers('x-tenant-id')
for direct HTTP header extraction
- Return defaults gracefully when no tenant is selected
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Agent-service does not have a registered Passport JWT strategy —
JWT validation is handled by Kong API gateway. The AuthGuard was
causing 500 "Unknown authentication strategy" errors on all
new controller endpoints.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement missing REST API endpoints that the web-admin frontend
pages were calling but had no backend support:
- GET/POST/PUT /api/v1/agent-config (engine, prompt, turns, budget, tools)
- GET/POST/PUT/DELETE /api/v1/agent/skills (CRUD for agent skills)
- GET/POST/PUT/DELETE /api/v1/agent/hooks (CRUD for hook scripts)
Each endpoint includes entity, repository, service, and controller
layers following the existing DDD + tenant-aware patterns.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Model downloads (Whisper, Kokoro, Silero VAD) are synchronous blocking
calls that prevent uvicorn from completing startup and responding to
healthchecks. Move all model loading to a daemon thread so the server
starts immediately.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wrap model loading in try/except so server starts even if models fail
- Fix device env var mapping (unified 'device' field instead of 'whisper_device')
- Default Whisper model to 'base' instead of 'large-v3' (3GB) for CPU deployment
- Increase healthcheck start_period to 120s for model download time
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Dockerfile.service: fix entry point path (dist/services/{name}/src/main)
due to tsconfig paths widening rootDir during compilation
- Kong config: remove unsupported ws/wss protocols (WebSocket works
automatically over http/https in Kong 3.7)
- voice-service: fix pipecat import path for v0.0.30 API
(pipecat.transports.network.websocket_server with lowercase class names)
- voice-service: add openai dependency required by pipecat anthropic service
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
faster-whisper 1.0.0 depends on av==11.* which has no prebuilt wheels
and fails to compile. Version 1.2.1 uses av 12+ with prebuilt wheels.
Also removed unnecessary FFmpeg dev libraries from Dockerfile.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PyAV (av==11, dep of faster-whisper) requires pkg-config and
FFmpeg development headers to compile from source.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Server is on HK network, no need for China mirrors. Added
build-essential for compiling native Python packages (kokoro, etc).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
web-admin npm ci was timing out on the server. Added npmmirror.com
for npm and tsinghua mirror for pip to resolve network issues.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>