Commit Graph

65 Commits

Author SHA1 Message Date
hailin 8e4bd573f4 fix: deduplicate text events from SDK stream_event and assistant message
SDK sends text both via stream_event deltas (token-level) and assistant
message (complete block). Track hasStreamedText flag per session to skip
duplicate text extraction from assistant messages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 03:31:48 -08:00
hailin 65e68a0487 feat: streaming TTS — synthesize per-sentence as agent tokens arrive
Replace batch TTS (wait for full response) with streaming approach:
- _agent_generate → _agent_stream async generator (yield text chunks)
- _process_speech accumulates tokens, splits on sentence boundaries
- Each sentence is TTS'd and sent immediately while more tokens arrive
- First audio plays within ~1s of agent response vs waiting for full text

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 03:14:22 -08:00
hailin aa2a49afd4 fix: extract text from assistant message + fix event data parsing
Root causes found:
1. SDK engine only emitted 'completed' without 'text' events because
   mapSdkMessage skipped text blocks in 'assistant' messages (assumed
   stream_event deltas would provide them, but SDK didn't send deltas)
2. Voice pipeline read evt_data.data.content but engine events are flat
   (evt_data.content) — so even if text arrived, it was never extracted

Fixes:
- Extract text/thinking blocks from assistant messages in SDK engine
- Fix voice pipeline to read content directly from evt_data, not nested

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 03:01:25 -08:00
hailin a7b42e6b98 feat: add detailed logging to agent engine and task controller
Log every SDK message type, event emission, and stream lifecycle
to diagnose why text events are missing in voice-agent flow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 02:56:09 -08:00
hailin 0dbe711ed3 feat: add detailed logging to voice pipeline (STT/Agent/TTS timing)
Log timestamps, content, and event details at each pipeline stage
to help diagnose voice-agent integration issues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 02:47:21 -08:00
hailin 1d5c834dfe feat: add event buffering to agent WS gateway for late subscribers
Buffer stream events when no WS clients are subscribed yet, then replay
them when a client subscribes. This eliminates the race condition where
events are lost between task creation and WS subscription.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 02:41:38 -08:00
hailin 370e32599f fix: subscribe to agent WS before creating task to avoid race condition
The engine stream could emit text events before the voice pipeline
subscribed, causing all text to be lost.  Now we connect and subscribe
first, then POST the task.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 02:35:57 -08:00
hailin abf5e29419 feat: route voice pipeline through agent-service instead of direct LLM
Voice calls now use the same agent task + WS subscription flow as the
chat UI, enabling tool use and command execution during voice sessions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 00:47:31 -08:00
hailin 7afbd54fce fix: rewrite voice pipeline for direct WebSocket I/O, fix TTS and navigation
Root cause: Pipecat's WebsocketServerTransport creates its own WebSocket
server on (host,port) and expects FrameProcessor subclasses. Our code was
passing a FastAPI WebSocket object as 'host' and using plain STT/TTS/VAD
service classes that aren't FrameProcessors. The pipeline crashed immediately
when receiving audio, causing "disconnects when speaking".

Changes:
- **base_pipeline.py**: Complete rewrite — replaced Pipecat Pipeline with
  direct async loop: WebSocket → VAD → STT → Claude LLM → TTS → WebSocket.
  Supports barge-in (interrupt TTS when user speaks), audio chunking, and
  24kHz→16kHz TTS resampling.
- **session_router.py**: Pass WebSocket directly to pipeline instead of
  wrapping in AppTransport.
- **app_transport.py**: Deprecated (no longer needed).
- **kokoro_service.py**: Fix misaki compatibility (MutableToken→MToken
  rename), use correct Chinese voice 'zf_xiaoxiao', handle torch tensors.
- **main.py**: Apply misaki monkey-patch before importing kokoro.
- **settings.py**: Change default TTS voice from 'zh_female_1' (non-existent)
  to 'zf_xiaoxiao' (valid Kokoro-82M Chinese female voice).
- **requirements.txt**: Remove pipecat-ai dependency, pin kokoro==0.3.5 +
  misaki==0.7.17, add Chinese NLP deps (pypinyin, cn2an, jieba, ordered-set).
- **agent_call_page.dart**: Wrap each cleanup step in try/catch to ensure
  Navigator.pop() always executes after call ends. Add 3s timeout on session
  delete request.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 23:34:35 -08:00
hailin 6cd53e713c fix: bypass JWT for voice WebSocket route (fixes 401 on WS upgrade)
根因:Kong 日志显示 voice WebSocket 连接被 JWT 插件返回 401,
因为 WebSocket RFC 6455 不支持自定义 header,Flutter 的
WebSocketChannel.connect 无法携带 Authorization header。

修复策略(业界标准做法):
1. Kong: 将 voice-service 的 JWT 从 service 级别改为 route
   级别,仅在 voice-api 和 twilio-webhook 路由启用 JWT,
   voice-ws 路由免除(session 创建已通过 JWT 验证,
   session_id 本身作为认证凭据)
2. 后端: session_router 返回的 websocket_url 改为
   /ws/voice/{session_id}(匹配 Kong voice-ws 路由路径)
3. FastAPI: 在 app 级别增加 /ws/voice/{session_id} 顶级
   WebSocket 路由,委托给 session_router 的 handler

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 21:30:11 -08:00
hailin 74be945e4a feat: enable token-level streaming and fix duplicate message bubble
Backend:
- Add includePartialMessages: true to SDK query options
- Handle stream_event/content_block_delta for real-time text streaming
- Skip text/thinking blocks from complete assistant messages (already
  streamed via deltas) to avoid duplication
- Change default result summary to empty string

Flutter:
- Only show CompletedEvent summary when no assistant text was streamed
  (prevents duplicate message bubble)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 17:24:48 -08:00
hailin 5f827b0961 fix: revert ws/wss protocols - Kong OSS handles WS over http/https
Kong 3.7 OSS doesn't support ws/wss protocol identifiers (Enterprise only).
WebSocket upgrades are handled transparently over http/https protocols.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 17:02:53 -08:00
hailin 86d7cac631 fix: replace Socket.IO with raw WebSocket to fix 502 on /ws/agent
Socket.IO requires its own handshake protocol (EIO=4) which Kong cannot
proxy as a plain WebSocket upgrade, causing 502 Bad Gateway. Switch to
@nestjs/platform-ws (WsAdapter) with manual session room tracking so
Flutter's IOWebSocketChannel can connect directly.

Also add ws/wss protocols to Kong WebSocket routes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 16:52:43 -08:00
hailin 9cdc4933dc fix: add python-multipart dependency for voice-service
Required by FastAPI for form/file upload parsing. Missing dependency
may cause import errors and container restart loops.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 16:10:50 -08:00
hailin 3cb9ebd407 fix: release QueryRunner connections to prevent pool exhaustion
TenantAwareRepository.getRepository() was calling createQueryRunner()
without ever releasing it, causing database connection pool exhaustion.
This caused ops-service (and eventually other services) to hang on
all API requests once the pool filled up.

Replaced getRepository() with withRepository() pattern that wraps
operations in try/finally to always release the QueryRunner.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 15:55:06 -08:00
hailin a6cd3c20d9 feat: add WebSocket robustness to voice call (heartbeat, reconnect, jitter buffer)
Addresses reliability gaps in the real-time voice WebSocket connection
between Flutter client and Python voice-service backend.

Backend (voice-service):
- Heartbeat: new _heartbeat_sender coroutine sends JSON ping text frames
  every 15s alongside the Pipecat pipeline; failed send = dead connection
- Session preservation: on WebSocket disconnect, sessions are now marked
  "disconnected" with a timestamp instead of being deleted, allowing
  reconnection within a configurable TTL (default 60s)
- Reconnect endpoint: POST /sessions/{id}/reconnect verifies the session
  is alive and in "disconnected" state, returns fresh websocket_url
- Reconnect-aware WS handler: detects "disconnected" sessions, cancels
  stale pipeline tasks, creates a new pipeline, sends "session.resumed"
- Background cleanup: asyncio loop every 30s removes sessions that have
  been disconnected longer than session_ttl
- Structured event protocol: text frames = JSON control messages
  (ping/pong/session.resumed/session.ended/error), binary = PCM audio
- New settings: session_ttl (60s), heartbeat_interval (15s),
  heartbeat_timeout (45s)

Flutter (agent_call_page.dart):
- Heartbeat monitoring: tracks last server ping timestamp, triggers
  reconnect if no ping received in 45s (3 missed intervals)
- Auto-reconnect: exponential backoff (1s→2s→4s→8s→16s), max 5 attempts;
  calls /reconnect endpoint to verify session, rebuilds WebSocket,
  resets audio buffer, restarts heartbeat
- Reconnecting UI: yellow warning banner "重新连接中... (N/5)" with
  spinner overlay during reconnection attempts
- WebSocket data routing: _onWsData distinguishes String (JSON control)
  from binary (audio) frames, handles ping/session.resumed/session.ended
- User-initiated disconnect guard: _userEndedCall flag prevents reconnect
  attempts when user intentionally hangs up
- session_id field compatibility: supports session_id/sessionId/id

Flutter (pcm_player.dart):
- Jitter buffer: queues incoming PCM chunks, starts playback only after
  accumulating 4800 bytes (150ms at 16kHz 16-bit mono) to smooth out
  network timing variance
- reset() method: clears buffer on reconnect to discard stale audio
- Buffer underrun handling: re-enters buffering phase if queue empties

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 07:32:19 -08:00
hailin d4391eef97 fix: run services as non-root user for SDK bypassPermissions
SDK blocks bypassPermissions when running as root for security.
Add non-root 'appuser' to Dockerfile.service and update volume
mounts to use /home/appuser/.claude paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 06:41:10 -08:00
hailin 04a18a7899 fix: use acceptEdits mode and mount .claude.json for SDK
- bypassPermissions blocked by SDK when running as root
- Switch to acceptEdits with canUseTool for programmatic control
- Mount .claude.json config file into container

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 06:37:31 -08:00
hailin db1d0620f2 debug: add stderr callback to SDK engine for error visibility
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 06:34:42 -08:00
hailin d40f66ce14 fix: use bypassPermissions mode for headless SDK execution
In a Docker container without TTY, permissionMode 'default' blocks
waiting for interactive permission prompts. Switch to bypassPermissions
with canUseTool callback for programmatic risk-based access control.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 06:30:38 -08:00
hailin 14e8d7019a fix: use dynamic import helper for ESM-only claude-agent-sdk
tsc with module=commonjs converts `await import()` to require(),
which breaks ESM-only packages. Use Function('return import()')
workaround to preserve native dynamic import at runtime.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 06:18:04 -08:00
hailin b963b7d4da feat: enable SDK subscription mode with OAuth credentials mount
- Mount ~/.claude/ into agent-service container for OAuth token access
- Switch default engine to claude_agent_sdk
- Remove ANTHROPIC_API_KEY from env in subscription mode so SDK uses OAuth
- Keep API key mode for per-tenant billing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 06:14:45 -08:00
hailin 9126225317 fix: disable TLS verification for Anthropic proxy (self-signed cert)
Follow iConsulting pattern: set NODE_TLS_REJECT_UNAUTHORIZED=0 when
ANTHROPIC_BASE_URL is configured, enabling connection through the
self-signed proxy at 67.223.119.33.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 05:52:02 -08:00
hailin 810dcd7def feat: switch default engine to claude_api with base URL support
- Change AGENT_ENGINE_TYPE from claude_code_cli to claude_api in docker-compose
- Add ANTHROPIC_BASE_URL env var support to claude-api-engine
- Add ANTHROPIC_BASE_URL to agent-service environment in docker-compose

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 05:45:08 -08:00
hailin 9a1ecf10ec fix: add restart policy, global error handlers, and fix tenant schema bug
- Add restart: unless-stopped to all 12 Docker services
- Add process.on(unhandledRejection/uncaughtException) to all 7 service main.ts
- Fix handleEventTrigger using tenantId UUID as schema name instead of slug lookup
- Wrap Redis event subscription callbacks in try/catch

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 05:30:34 -08:00
hailin 1c291ce6c0 fix: Scheduler 用 slug 构建 tenant schema 名
tenant schema 是 it0_t_{slug}(如 it0_t_default),不是 it0_t_{uuid}。

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 05:03:04 -08:00
hailin 09274aa6af fix: Scheduler 查询 public.tenants 而非 it0_shared.tenants
数据库实际 schema 是 public.tenants,不是 it0_shared.tenants。

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 05:00:17 -08:00
hailin 840318f449 fix: Scheduler 缺少 tenant 上下文导致 ops-service 卡死
根因: @Cron 定时任务在 HTTP 请求上下文之外运行,
TenantAwareRepository 需要 AsyncLocalStorage 中的 tenant 信息,
每分钟抛 "Tenant context not initialized" 错误。

修复:
- scanCronOrders: 查 it0_shared.tenants 获取所有活跃租户,
  在 TenantContextService.run() 上下文中逐一执行
- handleEventTrigger: 从 Redis event 中提取 tenantId,
  同样包裹在 TenantContextService.run() 中
- 每个 tenant 循环加 try/catch 防止单个租户出错影响其他

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 04:55:52 -08:00
hailin 092a561867 feat: 完成 iAgent App 三大功能 + 修复租户上下文
## 功能一:设置页(完整实现)
- 新增浅色主题(lightTheme),支持深色/浅色/跟随系统三种模式
- app.dart 接入 themeMode 动态切换
- 设置页完整重写:个人信息编辑、修改密码、主题切换、通知开关
- 新增 settings_remote_datasource 对接后端 admin/settings API
- settings_providers 新增 AccountProfileNotifier 管理远程个人资料

## 功能二:语音通话(音频集成)
- 添加 flutter_sound 依赖,创建 PcmPlayer 流式 PCM 播放器
- agent_call_page 替换空壳:真实麦克风采集(record + GTCRN 降噪)
- 真实 PCM 16kHz 流式播放,基于 RMS 能量驱动波形动画
- 修复 WebSocket URL 路径:/ws/voice/ → /api/v1/voice/ws/
- voice_repository_impl 支持后端返回相对路径自动拼接

## 功能三:推送通知(WebSocket MVP)
- 添加 flutter_local_notifications + socket_io_client 依赖
- 创建 AppNotification 实体、NotificationService(Socket.IO 连接 comm-service)
- 通知 providers:列表管理 + 未读计数
- 登录后自动连接通知服务,登出断开
- 底部导航 Alerts 标签添加未读角标(Badge)
- AndroidManifest 添加 POST_NOTIFICATIONS 权限
- main.dart 初始化本地通知插件

## 修复:租户上下文未初始化(500错误)
- 根因:登录后未设置 currentTenantIdProvider,导致 X-Tenant-Id 头缺失
- Flutter 端:login() 成功后从 JWT 设置 tenantId,logout 时清除
- 后端:tenant-context.middleware 增加 JWT tenantId 回退逻辑
- AuthUser 模型新增 tenantId 字段解析

新增 5 个文件,修改 16 个文件,添加 3 个依赖包

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 01:10:52 -08:00
hailin 4e1b75483d fix: 修复 .gitignore 误忽略 Flutter data/models/ 源码导致构建失败
问题描述:
  在其他机器上构建报错:
  "Error when reading 'lib/features/auth/data/models/auth_response.dart': 系统找不到指定的路径"
  导致 AuthUser、AuthResponse 等类型找不到,编译失败。

根本原因:
  根目录 .gitignore 第75行 "models/" 规则本意是忽略 ML 模型大文件,
  但该规则匹配了所有目录名为 models/ 的路径,包括 Flutter 项目中
  DDD 架构的 data/models/ 源码目录(共 11 个 models/ 目录、10 个 .dart 文件)。
  这些文件在本地存在但从未被 Git 追踪,其他机器 pull 后缺失这些文件。

修复内容:
  1. 修改 .gitignore: 将宽泛的 "models/" 替换为精确的规则
     - packages/services/voice-service/models/ — voice-service 下载的 ML 模型
     - *.pt, *.pth, *.safetensors — PyTorch/HuggingFace 模型二进制文件
     - 不再影响 Flutter 的 data/models/ 源码目录

  2. 提交之前被忽略的 10 个 Flutter model 文件:
     - auth/data/models/auth_response.dart — 登录响应 (accessToken, refreshToken, user)
     - chat/data/models/chat_message_model.dart — 聊天消息模型
     - chat/data/models/session_model.dart — 会话模型
     - chat/data/models/stream_event_model.dart — SSE 流事件模型
     - servers/data/models/server_model.dart — 服务器状态模型
     - approvals/data/models/approval_model.dart — 审批请求模型
     - alerts/data/models/alert_event_model.dart — 告警事件模型
     - agent_call/data/models/voice_session_model.dart — 语音会话模型
     - standing_orders/data/models/standing_order_model.dart — 常设指令模型
     - tasks/data/models/task_model.dart — 任务模型

  3. 同时提交:
     - it0_app/test/widget_test.dart — Flutter 默认测试
     - packages/services/voice-service/src/models/__init__.py — Python 模块初始化

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 16:29:03 -08:00
hailin a568558585 feat: replace speech_to_text with GTCRN ML noise reduction + backend STT
Replace traditional on-device speech_to_text with a modern pipeline:
- Record audio via `record` package with hardware noise suppression
- Apply GTCRN neural denoising (sherpa-onnx, ICASSP 2024, 48K params)
- Trim silence, POST to backend /voice/transcribe (faster-whisper)

Changes:
- Add /transcribe endpoint to voice-service for audio file upload
- Add SpeechEnhancer wrapper for sherpa-onnx GTCRN model (523KB)
- Rewrite chat_page.dart voice input: record → denoise → transcribe
- Keep NoiseReducer.trimSilence for silence removal only
- Upgrade record to v6.2.0, add sherpa_onnx, path_provider

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 07:59:15 -08:00
hailin 89955f6db8 fix: remove SQL comment lines before splitting to prevent filtering CREATE TABLE statements
The previous approach split by semicolons then filtered statements starting
with '--', which incorrectly removed entire CREATE TABLE blocks that had
comment headers (e.g., '-- Agent Sessions\nCREATE TABLE...').

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 03:26:37 -08:00
hailin 7ff012cd91 fix: use underscores in tenant slug for valid PostgreSQL schema names
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 03:23:14 -08:00
hailin 5d81667ddd feat: add dual tenant registration (self-service + invitation)
Backend:
- Enhanced register endpoint to accept companyName for self-service
  tenant creation with schema provisioning and admin user setup
- Added TenantInvite entity with token-based invitation system
- Added invite CRUD endpoints to TenantController (create/list/revoke)
- Added public endpoints for invite validation and acceptance

Frontend:
- Created registration page with optional organization name field
- Created invitation acceptance page at /invite/[token]
- Added invite management UI to tenant detail page
- Updated login page with link to registration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 03:10:18 -08:00
hailin 7dbd2c1414 feat: add settings, roles, permissions, and metrics controllers
Implement remaining backend controllers for all web admin menu pages:
- SettingsController: general, notification, theme, account, API keys
- RoleController: CRUD roles with permission assignment
- PermissionController: permission matrix for RBAC management
- MetricsController: server metrics overview and per-server data

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 01:03:34 -08:00
hailin 8f89b8121c fix: format tenant API response to match frontend DTO
Map flat quota fields to nested quota object and add userCount field
to match the frontend's expected Tenant interface.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 00:41:19 -08:00
hailin 3816d6841d fix: add users endpoint, admin route, and fix agent-config paths
- Add UsersController to auth-service for user CRUD (GET/POST/PUT/DELETE /api/v1/auth/users)
- Add Kong route /api/v1/admin -> auth-service for tenant management
- Remove AuthGuard from TenantController (Kong handles JWT)
- Fix frontend agent-config API paths from /api/v1/agent/config to /api/v1/agent-config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 00:35:57 -08:00
hailin 52b85f085e fix: decode JWT in middleware to populate req.user for RolesGuard
Kong validates the JWT but doesn't populate req.user on the backend.
The middleware now decodes the JWT payload to extract user info (id,
email, tenantId, roles) so RolesGuard can check role-based access.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 00:25:32 -08:00
hailin f393a07092 fix: correct alert-rules API paths and remove audit ACL plugin
- Frontend alert-rules paths changed from /monitoring/alert-rules to
  /monitor/alerts/rules to match backend routes
- Removed Kong ACL plugin on audit-routes (JWT auth is sufficient)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 00:21:50 -08:00
hailin e98cf26587 fix: add missing columns to tenant schema template (runbook.updated_at, contact.role)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 00:18:46 -08:00
hailin e0ef15df1e fix: add SnakeNamingStrategy for TypeORM to match snake_case DB columns
TypeORM entities use camelCase properties (tenantId, passwordHash) but
database tables use snake_case columns (tenant_id, password_hash). The
naming strategy automatically converts between the two conventions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 00:10:08 -08:00
hailin a72cbd3778 fix: use any types in TenantContextMiddleware to avoid express dependency
The @it0/database package doesn't have @types/express, causing build
failures. Use any types for req/res/next parameters instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 00:00:55 -08:00
hailin 5b6e7ee363 fix: add TenantContextMiddleware to initialize tenant context from X-Tenant-Id header
All services using TenantAwareRepository require AsyncLocalStorage tenant
context to set the correct PostgreSQL search_path. The middleware reads
X-Tenant-Id from request headers and wraps the request with
TenantContextService.run(), using schema naming convention it0_t_{tenantId}.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 23:58:01 -08:00
hailin 806113554b fix: remove AuthGuard('jwt') from all service controllers
Kong handles JWT validation at the gateway level. Service-level
AuthGuard('jwt') fails because services don't register a Passport
JWT strategy (only auth-service does). Removed from 17 controllers
across ops, inventory, monitor, comm, audit, and agent services.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 23:42:37 -08:00
hailin c710303b60 fix: per-service JWT in Kong, fix auth-service tenant-aware repos
- Replace global JWT plugin with per-service JWT (skip auth-service)
  to fix auth routes being blocked by global JWT in DB-less mode
- Fix UserRepository and ApiKeyRepository to use standard TypeORM
  instead of TenantAwareRepository (users are global, not per-schema)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 23:31:32 -08:00
hailin 7dd7de4a22 fix: use COPY --chmod for Kong entrypoint (non-root image)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 23:24:37 -08:00
hailin 48e47975ca fix: configure Kong JWT auth flow with consumer credentials
- Add kid claim to auth-service JWT for Kong validation
- Add Kong consumer with JWT credential (shared secret via env)
- Add agent-config route to Kong for /api/v1/agent-config
- Kong Dockerfile uses entrypoint script to inject JWT_SECRET at runtime
- Fix frontend login path (/auth/login → /api/v1/auth/login)
- Extract tenantId from JWT on login and store as current_tenant
- Add auth guard in admin layout (redirect to /login if no token)
- Pass JWT_SECRET env var to Kong container in docker-compose

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 23:20:06 -08:00
hailin e5dcfa6113 feat: configure it0.szaiai.com and it0api.szaiai.com domains
- Update Kong CORS origins to allow it0.szaiai.com
- Update WebSocket URL to wss://it0api.szaiai.com
- Fix proxy route to read API_BASE_URL at request time
  (was being inlined at build time by Next.js standalone)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 22:54:17 -08:00
hailin d8cb2a9c6f fix: use standard TypeORM repos and header-based tenant extraction
- Replace TenantAwareRepository with standard @InjectRepository
  (TenantAwareRepository requires AsyncLocalStorage tenant context
  middleware which agent-service does not have)
- Replace @TenantId() decorator with @Headers('x-tenant-id')
  for direct HTTP header extraction
- Return defaults gracefully when no tenant is selected

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 22:41:30 -08:00
hailin f897cfe240 fix: remove AuthGuard('jwt') from agent-service controllers
Agent-service does not have a registered Passport JWT strategy —
JWT validation is handled by Kong API gateway. The AuthGuard was
causing 500 "Unknown authentication strategy" errors on all
new controller endpoints.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 22:36:46 -08:00