Commit Graph

31 Commits

Author SHA1 Message Date
hailin 8d2fd3335a feat(telemetry): add presence-service + Flutter telemetry module
## Backend — packages/services/presence-service (新微服务)

完整的 DDD + Clean Architecture 实现,移植自 RWADurian presence-service,
针对 IT0 架构做了以下适配:

### 核心功能
- 心跳接口: POST /api/v1/presence/heartbeat(JWT 验证,60s 间隔)
  → Redis Sorted Set `presence:online_users` 记录在线时间戳
  → 默认 5 分钟窗口判断在线(PRESENCE_WINDOW_SECONDS=300)
- 事件上报: POST /api/v1/analytics/events(批量,最多 50 条)
  → 写入 presence_event_log 表 + 更新 presence_device_profile
  → Redis HyperLogLog `presence:dau:{date}` 实时 DAU 估算
- 查询接口(需 AdminGuard):
  - GET /api/v1/analytics/online-count  — 实时在线人数
  - GET /api/v1/analytics/online-history — 历史在线快照
  - GET /api/v1/analytics/dau — DAU 统计

### IT0 适配要点
- JWT payload: `sub` = UUID userId(非 RWADurian 的 userSerialNum)
  → JwtAuthGuard: request.user = { userId: payload.sub, roles, tenantId }
- AdminGuard: 改为检查 `roles.includes('admin')`(非 type==='admin')
- 移除 Kafka EventPublisherService(IT0 无 Kafka)
- 移除 Prometheus MetricsService(IT0 无 Prometheus)
- 表前缀改为 `presence_`(避免与其他服务冲突)
- userId 字段 VarChar(36)(UUID 格式,非原来的 VarChar(20))
- Redis DB=10 隔离(独立 key 空间)

### 数据库表(public schema)
- presence_event_log       — 事件流水(append-only)
- presence_device_profile  — 设备快照(upsert,每台设备一行)
- presence_daily_active_users — DAU 日统计
- presence_online_snapshots   — 在线人数每分钟快照

### 定时任务(@nestjs/schedule)
- 每分钟: 采集在线人数快照 → presence_online_snapshots
- 每天 01:05 (UTC+8): 计算前一天 DAU → presence_daily_active_users

---

## Flutter — it0_app/lib/core/telemetry (新模块)

### 文件结构
- telemetry_service.dart      — 单例入口,统筹所有组件
- models/telemetry_event.dart — 事件模型,toServerJson() 将设备字段提升为顶层列
- models/device_context.dart  — 设备上下文(Android/iOS 信息)
- models/telemetry_config.dart — 远程配置(采样率/开关,支持远端同步)
- collectors/device_info_collector.dart — 采集 device_info_plus 设备信息
- storage/telemetry_storage.dart  — SharedPreferences 队列(最多 500 条)
- uploader/telemetry_uploader.dart — 批量上传到 /api/v1/analytics/events
- session/session_manager.dart    — WidgetsBindingObserver 监听前后台切换
- session/session_events.dart     — 会话事件常量
- presence/heartbeat_service.dart — 定时心跳 POST /api/v1/presence/heartbeat
- presence/presence_config.dart   — 心跳配置(间隔/requiresAuth)
- telemetry.dart                  — barrel 导出

### 集成点
- app_router.dart _tryRestore(): TelemetryService().initialize() 在 auth 之前
- auth_provider.dart login/loginWithOtp: setUserId + setAccessToken + resumeAfterLogin
- auth_provider.dart tryRestoreSession: 恢复 userId + accessToken
- auth_provider.dart logout: pauseForLogout + clearUserId + clearAccessToken

### 新增依赖
- device_info_plus: ^10.1.0
- equatable: ^2.0.5

---

## 基础设施

### Dockerfile.service
- 在 builder 和 production 阶段均添加 presence-service/package.json 的 COPY

### docker-compose.yml
- 新增 presence-service 容器(端口 3011/13011)
  - DATABASE_URL: postgresql://... (Prisma 所需连接串格式)
  - REDIS_HOST/PORT/DB: 10(presence 独立 Redis DB)
  - APP_PORT=3011, JWT_SECRET, PRESENCE_WINDOW_SECONDS=300
- api-gateway depends_on 新增 presence-service

### kong.yml (dbless 声明式)
- 新增 presence-service 服务(http://presence-service:3011)
  - presence-routes: /api/v1/presence
  - analytics-routes: /api/v1/analytics
- 对整个 presence-service 启用 JWT 插件(Kong 层鉴权)

### DB 迁移
- packages/shared/database/src/migrations/010-create-presence-tables.sql
  — 4 张 presence_ 前缀表 + 完整索引(IF NOT EXISTS 幂等)
- run-migrations.ts: runSharedSchema() 中新增执行 010-create-presence-tables.sql

---

## 部署步骤(服务器)

1. git pull
2. 执行 presence 表迁移(首次):
   docker exec it0-postgres psql -U it0 -d it0 \
     -f /path/to/010-create-presence-tables.sql
   或通过 migration runner:
   cd /home/ceshi/it0 && node packages/shared/database/dist/run-migrations.js
3. 重建并启动 presence-service:
   docker compose build presence-service api-gateway
   docker compose up -d presence-service api-gateway

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 17:44:01 -08:00
hailin 7d5840c245 feat(openclaw): Phase 1 — server pool + agent instance deployment infrastructure
## inventory-service
- New: pool_servers table (public schema, platform-admin managed)
- New: PoolServer entity, PoolServerRepository, PoolServerController
- CRUD endpoints at /api/v1/inventory/pool-servers
- Internal /deploy-creds endpoint (x-internal-api-key protected) for SSH key retrieval
- increment/decrement endpoints for capacity tracking

## agent-service
- New: agent_instances table (tenant schema)
- New: AgentInstance entity, AgentInstanceRepository, AgentInstanceController
- New: AgentInstanceDeployService — SSH-based docker deployment
  - Queries pool server availability from inventory-service
  - AES-256 encrypts OpenClaw gateway token at rest
  - Allocates host ports in range 20000-29999
  - Fires docker run for it0hub/openclaw-bridge:latest
  - Async deploy with error capture
- Added ssh2 dependency for SSH execution
- Added INVENTORY_SERVICE_URL, INTERNAL_API_KEY, VAULT_MASTER_KEY to docker-compose

## openclaw-bridge (new package)
- packages/openclaw-bridge/ — custom Docker image
- Two processes via supervisord: OpenClaw gateway + IT0 Bridge (Node.js)
- IT0 Bridge exposes REST API on port 3000:
  GET /health, GET /status, POST /task, GET /sessions, GET /metrics
- Connects to OpenClaw gateway at ws://127.0.0.1:18789 via WebSocket RPC
- Sends heartbeat to IT0 agent-service every 60s
- Dockerfile: multi-stage build (openclaw source + bridge TS compilation)

## Web Admin
- New: /server-pool page — list/add/edit/delete pool servers with capacity bars
- New: /openclaw-instances page — cross-tenant instance monitoring with status filter
- Sidebar: added 服务器池 (Database icon) + OpenClaw 实例 (Boxes icon) to platform_admin nav

## Flutter App
- my_agents_page: rewritten to show real AgentInstance data from /api/v1/agent/instances
- Added AgentInstance model with status-driven UI (running/deploying/stopped/error)
- Status badges with color coding + spinner for deploying state
- Summary chips showing running vs stopped counts
- api_endpoints.dart: added agentInstances endpoint

## Design docs
- OPENCLAW_INTEGRATION_PLAN.md: complete architecture document with all confirmed decisions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 11:11:21 -08:00
hailin d5930ff4c8 feat(app): redesign navigation — floating robot FAB + 4-tab layout
- Add animated robot avatar widget (CustomPainter, 5 states: idle/thinking/executing/speaking/alert)
- Add FloatingRobotFab that mirrors chatProvider AgentStatus as robot animation state
- Replace 5-tab nav (dashboard/chat/tasks/alerts/settings) with 4-tab (home/my-agents/billing/profile)
- Chat is now pushed full-screen from the robot FAB with slide-up transition
- HomePage: active agent status card + official agent horizontal scroll + quick tips
- MyAgentsPage: empty state with 3-step guide + template grid; shows list when agents exist
- ProfilePage: merged settings + prominent billing entry (replaces old SettingsPage as tab)
- ChatPage AppBar: robot avatar replaces plain text title, reflects real-time agent state
- Add agentConfigs endpoint to ApiEndpoints

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 09:42:17 -08:00
hailin 71ea80972d feat(auth): add SMS OTP verification for phone registration and login
- auth-service: add SmsService (Aliyun SMS) + RedisProvider for OTP storage
- POST /api/v1/auth/sms/send — send OTP (rate limited 1/min per phone)
- POST /api/v1/auth/sms/verify — verify OTP only
- POST /api/v1/auth/login/otp — passwordless login with phone + OTP
- register endpoint now requires smsCode when registering with phone
- Web Admin register page: add OTP input + 60s countdown button for phone mode
- Flutter login page: add 验证码登录 tab with phone + OTP flow
- SMS enabled via ALIYUN_ACCESS_KEY_ID/SECRET + SMS_ENABLED=true env vars
- Falls back to mock mode (logs code) when env vars not set

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 06:43:27 -08:00
hailin 0aac693b5d fix(app): re-check connectivity on foreground resume to clear false-offline banner
When backgrounded, the periodic TCP ping times out causing isOnline=false.
On resume, immediately re-check so the banner clears as soon as the app
is foregrounded rather than waiting up to 30s for the next scheduled check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 08:08:01 -08:00
hailin e6f864d409 fix(version-service+gateway+app): fix APK download 404 and SHA-256 false failure
Three coordinated fixes to make in-app APK download work end-to-end:

1. version-service/main.ts: serve uploaded files as static assets via
   NestExpressApplication.useStaticAssets('/data/versions', prefix:
   '/downloads/versions'), so GET /downloads/versions/{platform}/{file}
   returns the actual APK stored in the Docker volume.

2. kong.yml: add /downloads/versions route to Kong so requests from
   the Flutter app can reach version-service through the API gateway.
   Previously only /api/v1/versions and /api/app/version were routed;
   the download URL returned by the check endpoint was unreachable (404).

3. download_manager.dart: skip SHA-256 verification when sha256Expected
   is empty string. The check endpoint always returns sha256:"" because
   version-service doesn't store file hashes. The previous code compared
   actual_hash == "" which always failed, causing the downloaded file to
   be deleted after a successful download.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 06:04:27 -08:00
hailin 0f328b9794 feat(it0_app): add detailed logging to VersionChecker for update diagnosis
Add verbose debugPrint logs throughout VersionChecker to diagnose why
app update check isn't triggering:
- Log apiBaseUrl and full request URL + query params before the request
- Log response status code and raw response body
- Log explicit needUpdate=true/false with version details
- Log version code comparison (server versionCode vs local buildNumber)
- Add stack trace to all catch blocks for better error diagnosis

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 05:51:45 -08:00
hailin 9ed80cd0bc feat: implement complete commercial monetization loop (Phases 1-4)
## Phase 1 - Token Metering + Quota Enforcement

### Usage Tracking
- agent-service: add UsageRecord entity (per-tenant schema) tracking
  inputTokens/outputTokens/costUsd per AI task
- Modify all 3 AI engines (claude-api, claude-code-cli, claude-agent-sdk)
  to emit separate input/output token counts in the `completed` event
- claude-api-engine: costUsd = (input*3 + output*15) / 1,000,000
  (claude-sonnet-4-5 pricing: $3/MTok in, $15/MTok out)
- agent.controller: persist UsageRecord and publish `usage.recorded`
  event to Redis Streams on every task completion (non-blocking)
- shared/events: new events UsageRecordedEvent, SubscriptionChangedEvent,
  QuotaExceededEvent, PaymentReceivedEvent

### Quota Enforcement
- TenantInfo: add maxServers, maxUsers, maxStandingOrders,
  maxAgentTokensPerMonth fields
- TenantContextMiddleware: rewritten to query public.tenants table for
  real quota values; 5-min in-memory cache; plan-based fallback on error
- TenantContextService: getTenant() returns null instead of throwing;
  added getTenantOrThrow() for strict callers
- inventory-service/server.controller: 429 when maxServers exceeded
- ops-service/standing-order.controller: 429 when maxStandingOrders exceeded
- auth-service/auth.service: 429 when maxUsers exceeded
- 002-create-tenant-schema-template.sql: add usage_records table

## Phase 2 - billing-service (New Microservice, port 3010)

### Domain Layer (public schema, all UUIDs)
Entities: Plan, Subscription, Invoice, InvoiceItem, Payment, PaymentMethod,
UsageAggregate

Domain services:
- SubscriptionLifecycleService: full state machine (trialing -> active ->
  past_due -> cancelled/expired); upgrades immediate, downgrades at period end
- InvoiceGeneratorService: monthly invoice = base fee + overage charges;
  proration item for mid-cycle upgrades
- OverageCalculatorService: (totalTokens - includedTokens) * overageRate

### Infrastructure (all repos use DataSource directly, NOT TenantAwareRepository)
- PlanRepository, SubscriptionRepository, InvoiceRepository (atomic
  transaction for invoice+items), PaymentRepository (payments + methods),
  UsageAggregateRepository (UPSERT via ON CONFLICT for atomic accumulation)

### Application Use Cases
- CreateSubscriptionUseCase: called on tenant registration
- ChangePlanUseCase: upgrade (immediate + proration) or downgrade (scheduled)
- CancelSubscriptionUseCase: immediate or at-period-end
- GenerateMonthlyInvoiceUseCase: cron target (1st of month 00:05 UTC);
  generates invoices, renews periods, applies scheduled downgrades
- AggregateUsageUseCase: Redis Streams consumer group billing-service,
  upserts monthly usage aggregates from usage.recorded events
- CheckTokenQuotaUseCase: hard limit enforcement per plan
- CreatePaymentSessionUseCase + HandlePaymentWebhookUseCase

### REST API
- GET  /api/v1/billing/plans
- GET/POST /api/v1/billing/subscription (+ /upgrade, /cancel)
- GET  /api/v1/billing/invoices (paginated)
- GET  /api/v1/billing/invoices/:id
- POST /api/v1/billing/invoices/:id/pay
- GET  /api/v1/billing/usage/current + /history
- CRUD /api/v1/billing/payment-methods
- POST /api/v1/billing/webhooks/{stripe,alipay,wechat,crypto}

### Plan Seed (auto on startup via PlanSeedService)
- free:       $0/mo,    100K tokens,  no overage,  hard limit 100%
- pro:        $49.99/mo, 1M tokens,  $8/MTok,  hard limit 150%
- enterprise: $199.99/mo, 10M tokens, $5/MTok, no hard limit

## Phase 3 - Payment Provider Integration

### PaymentProviderRegistry (Strategy Pattern, mirrors EngineRegistry)
All providers use @Optional() injection; unconfigured providers omitted

- StripeProvider: PaymentIntent API; webhook via stripe.webhooks.constructEvent
- AlipayProvider: alipay-sdk; Native QR (precreate); RSA2 signature verify
- WeChatPayProvider: v3 REST; Native Pay code_url; AES-256-GCM decrypt;
  HMAC-SHA256 request signing and webhook verification
- CryptoProvider: Coinbase Commerce; hosted checkout; HMAC-SHA256 verify

### WebhookController
All 4 webhook endpoints are public (no JWT) for payment provider callbacks.
rawBody: true enabled in main.ts for signature verification.

## Infrastructure Changes
- docker-compose.yml: billing-service container (port 13010);
  added as dependency of api-gateway
- kong.yml: /api/v1/billing routes (JWT); /api/v1/billing/webhooks (public)
- 005-create-billing-tables.sql: 7 billing tables + invoice sequence +
  ALTER tenants to add quota columns
- run-migrations.ts: 005 runs as part of shared schema step

## Phase 4 - Frontend

### Web Admin (Next.js)
New pages:
- /billing: subscription card + token usage bar + warning banner + invoices
- /billing/plans: comparison grid with USD/CNY toggle + upgrade/downgrade flow
- /billing/invoices: paginated table with Pay Now button
Sidebar: Billing group (CreditCard icon, 3 sub-items)
i18n: billing keys added to en + zh sidebar translations

### Flutter App
New feature module it0_app/lib/features/billing/:
- BillingOverviewPage: plan card + token LinearProgressIndicator +
  latest invoice + upgrade button
- BillingProvider (FutureProvider): parallel fetch subscription/quota/invoice
Settings page: "订阅与用量" entry card
Router: /settings/billing sub-route

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 21:09:17 -08:00
hailin 8a48e92970 fix: use domain names for API access, China IP for LiveKit
Flutter app now uses https://it0api.szaiai.com (nginx reverse proxy)
instead of direct IP:port. LiveKit URL uses China IP 14.215.128.96
for lower latency from domestic mobile clients.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 21:44:25 -08:00
hailin 68ee2516d5 fix: use host networking for voice services to eliminate docker-proxy overhead
Bridge mode created 600+ docker-proxy processes for LiveKit's UDP port-range
mappings (30000-30100, 50000-50200). Switch livekit-server, voice-agent, and
voice-service to network_mode: host for zero-overhead networking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 19:58:32 -08:00
hailin 94a14b3104 feat: migrate voice call from WebSocket/PCM to LiveKit WebRTC
实时语音对话架构迁移:WebSocket → LiveKit WebRTC

## 背景
原语音通话架构基于 FastAPI WebSocket 传输原始 PCM,管道串行执行
(VAD → 批量STT → Agent → 攒句 → 批量TTS),首音频延迟约 6 秒。
迁移到 LiveKit Agents 框架后,利用 WebRTC 传输 + 流水线并行,
预期延迟降至 1.5-2 秒。

## 架构
Flutter App ←── WebRTC (Opus/UDP) ──→ LiveKit Server ←──→ Voice Agent
  livekit_client                      (自部署, Go)       (Python, LiveKit Agents SDK)
                                                          ├─ VAD (Silero)
                                                          ├─ STT (faster-whisper / OpenAI)
                                                          ├─ LLM (自定义插件 → agent-service)
                                                          └─ TTS (Kokoro / OpenAI)

关键设计:LLM 不直接调用 Claude API,而是通过自定义插件代理到现有
agent-service,保留 Tool Use、会话历史、租户隔离等能力。

## 新增服务

### voice-agent (packages/services/voice-agent/)
LiveKit Agent Worker,包含:
- agent.py: 入口,prewarm() 预加载模型,entrypoint() 编排会话
- plugins/agent_llm.py: 自定义 LLM 插件,代理 agent-service API
  - POST /api/v1/agent/tasks 创建任务
  - WS /ws/agent 订阅流式事件 (stream_event)
  - 跨轮复用 session_id 保持对话上下文
- plugins/whisper_stt.py: 本地 faster-whisper STT (批量识别)
- plugins/kokoro_tts.py: 本地 Kokoro-82M TTS (24kHz PCM)
- config.py: pydantic-settings 配置

### LiveKit Server (deploy/docker/)
- livekit.yaml: 信令端口 7880, RTC TCP 7881, UDP 50000-50200
- docker-compose.yml: 新增 livekit-server + voice-agent 容器

### LiveKit Token 端点
- voice-service/src/api/livekit_token.py:
  POST /api/v1/voice/livekit/token
  生成 Room JWT,嵌入 auth_header 到 AgentDispatch metadata

## Flutter 客户端改造
- agent_call_page.dart: 从 ~814 行简化到 ~380 行
  - 替换: WebSocketChannel, AudioRecorder, PcmPlayer, 手动心跳/重连
  - 使用: Room.connect(), setMicrophoneEnabled(true), LiveKit 事件监听
  - 波形动画改用 participant.audioLevel
- pubspec.yaml: 添加 livekit_client: ^2.3.0
- app_config.dart: 增加 livekitUrl 字段
- api_endpoints.dart: 增加 livekitToken 端点

## 配置说明 (环境变量)
- STT_PROVIDER: local (默认, faster-whisper) / openai
- TTS_PROVIDER: local (默认, Kokoro) / openai
- WHISPER_MODEL: base (默认) / small / medium / large
- WHISPER_LANGUAGE: zh (默认)
- KOKORO_VOICE: zf_xiaoxiao (默认)
- DEVICE: cpu (默认) / cuda

## 不变的部分
- agent-service: 完全不改,voice-agent 通过现有 API 调用
- voice-service 核心: pipeline/STT/TTS/VAD 保留 (Twilio 备用)
- Kong 网关: 现有路由不变
- 数据库: 无 schema 变更

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 08:55:33 -08:00
hailin 3025910095 ui: transparent compact AppBar (64dp → 44dp)
- AppBar background transparent, merges with scaffold for seamless look
- toolbarHeight reduced from 64dp to 44dp (~20dp screen space saved)
- scrolledUnderElevation: 0 prevents Material 3 shadow on scroll
- Icons 24→20px with VisualDensity.compact for tighter action buttons
- Title fontSize 16 w600, less visual weight
- Both dark and light themes updated consistently

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 05:20:23 -08:00
hailin f5d9b1f04f feat: add app upgrade system with self-hosted APK update support
- Add core/updater module: version checker, download manager (resumable + SHA-256),
  APK installer, app market detector, self-hosted updater with progress dialogs
- Add Android native MethodChannels for APK installation and market detection
- Add FileProvider config and REQUEST_INSTALL_PACKAGES permission
- Wire UpdateService singleton into main.dart initialization
- Add auto-check on home entry with cooldown + app resume detection
- Add manual "检查更新" button and dynamic version display in settings
- Fix chat page: bottom overflow, bash spinner persistence, collapsible results
- Merge standing orders into tasks page as second tab

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 22:35:01 -08:00
hailin 57fabb4653 fix: set interleaved=true for PcmPlayer streaming playback
FlutterSoundPlayer.feedUint8FromStream() requires interleaved mode.
With interleaved=false, every feed() call threw:
  "Cannot feed with UInt8 with non interleaved mode"

- feedUint8FromStream (Uint8List) → requires interleaved: true
- feedFromStream (Float32List) → requires interleaved: false
Since we feed raw PCM bytes (Uint8List), interleaved must be true.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 06:59:06 -08:00
hailin e706a4cdc7 fix: enable simultaneous playback + recording in voice call
Root cause: PcmPlayer called openPlayer() without audio session config,
so Android defaulted to earpiece-only mode. When the mic was actively
recording, playback was silently suppressed — the agent's TTS audio was
sent successfully over WebSocket but never reached the speaker.

Changes:

1. PcmPlayer (pcm_player.dart):
   - Added audio_session package for proper audio session management
   - Configure AudioSession with playAndRecord category so mic + speaker
     work simultaneously
   - Set voiceCommunication usage to enable Android hardware AEC (echo
     cancellation) — prevents feedback loops when speaker is active
   - defaultToSpeaker routes output to loudspeaker instead of earpiece
   - Restored setSpeakerOn() method stub (used by UI toggle)

2. AgentCallPage (agent_call_page.dart):
   - Fixed fire-and-forget bug: _pcmPlayer.feed() returns Future but was
     called without await, causing interleaved feedUint8FromStream calls
   - Added _feedChain serializer to guarantee sequential audio feeding

3. Dependencies:
   - Added audio_session package to pubspec.yaml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 06:48:16 -08:00
hailin 7b71a4f2fc fix: properly close WebSocket with subscription cancel + fire-and-forget
Root cause: IOWebSocketChannel.sink.close() can hang indefinitely
(dart-lang/web_socket_channel#185). Previous fix used unawaited close
but didn't cancel the stream subscription, so the old listener could
still push events to _messageController.

Fix: Extract _closeCurrentConnection() that:
1. Cancels StreamSubscription first (stops duplicate events immediately)
2. Fire-and-forget sink.close(goingAway) (frees underlying socket)

This follows the workaround recommended in the official issue tracker.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 03:45:43 -08:00
hailin 45eb6bc453 fix: use unawaited close to prevent WebSocket reconnect hang
The await on sink.close() blocks indefinitely when the server doesn't
respond to the close handshake. Use fire-and-forget with unawaited()
so the new connection can proceed immediately.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 03:41:13 -08:00
hailin 3185438f36 fix: close previous WebSocket before opening new connection
When sending a second message in the same session, the old WebSocket
connection was not closed, causing both connections to subscribe to the
same session room. This resulted in each text event being received twice,
producing garbled/duplicated output text.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 03:37:16 -08:00
hailin 5e31b15dcf fix: use IOWebSocketChannel for headers support
WebSocketChannel.connect does not accept headers parameter in
web_socket_channel 2.4.0. Use IOWebSocketChannel.connect instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 16:45:35 -08:00
hailin 803cea0fe4 fix: pass JWT token in WebSocket connection headers
WebSocket connections to /ws/agent were rejected by Kong (401)
because the Authorization header was not included. Now reads
access_token from secure storage and passes it in the WebSocket
upgrade request headers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 16:43:31 -08:00
hailin a6cd3c20d9 feat: add WebSocket robustness to voice call (heartbeat, reconnect, jitter buffer)
Addresses reliability gaps in the real-time voice WebSocket connection
between Flutter client and Python voice-service backend.

Backend (voice-service):
- Heartbeat: new _heartbeat_sender coroutine sends JSON ping text frames
  every 15s alongside the Pipecat pipeline; failed send = dead connection
- Session preservation: on WebSocket disconnect, sessions are now marked
  "disconnected" with a timestamp instead of being deleted, allowing
  reconnection within a configurable TTL (default 60s)
- Reconnect endpoint: POST /sessions/{id}/reconnect verifies the session
  is alive and in "disconnected" state, returns fresh websocket_url
- Reconnect-aware WS handler: detects "disconnected" sessions, cancels
  stale pipeline tasks, creates a new pipeline, sends "session.resumed"
- Background cleanup: asyncio loop every 30s removes sessions that have
  been disconnected longer than session_ttl
- Structured event protocol: text frames = JSON control messages
  (ping/pong/session.resumed/session.ended/error), binary = PCM audio
- New settings: session_ttl (60s), heartbeat_interval (15s),
  heartbeat_timeout (45s)

Flutter (agent_call_page.dart):
- Heartbeat monitoring: tracks last server ping timestamp, triggers
  reconnect if no ping received in 45s (3 missed intervals)
- Auto-reconnect: exponential backoff (1s→2s→4s→8s→16s), max 5 attempts;
  calls /reconnect endpoint to verify session, rebuilds WebSocket,
  resets audio buffer, restarts heartbeat
- Reconnecting UI: yellow warning banner "重新连接中... (N/5)" with
  spinner overlay during reconnection attempts
- WebSocket data routing: _onWsData distinguishes String (JSON control)
  from binary (audio) frames, handles ping/session.resumed/session.ended
- User-initiated disconnect guard: _userEndedCall flag prevents reconnect
  attempts when user intentionally hangs up
- session_id field compatibility: supports session_id/sessionId/id

Flutter (pcm_player.dart):
- Jitter buffer: queues incoming PCM chunks, starts playback only after
  accumulating 4800 bytes (150ms at 16kHz 16-bit mono) to smooth out
  network timing variance
- reset() method: clears buffer on reconnect to discard stale audio
- Buffer underrun handling: re-enters buffering phase if queue empties

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 07:32:19 -08:00
hailin 4651291468 style: 导航栏去掉蓝色药丸背景,改为图标/文字高亮
- indicatorColor: transparent 去掉 Material 3 默认的选中背景
- 选中项:图标 + 文字改为 primary 紫色,字重 w600
- 未选中项:图标 + 文字灰色 (textSecondary),字重 w400
- 与微信/支付宝/飞书的导航栏风格一致

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 05:09:34 -08:00
hailin 666b173906 fix: 根治 Unhandled Exception — async void 拦截器 + 全局错误兜底
根本原因:Dio interceptor 的 onError/onRequest 签名是 void,
标 async 后变成 Future<void> 但没人 await,内部异常全部变成
Unhandled Exception 崩溃。

修复:
- RetryInterceptor: onError 改为同步调度,retry 逻辑移到独立
  _retry() 方法并用 try/catch 包裹全部路径
- DedupInterceptor: 防止 Completer 重复 complete,retry 请求
  跳过去重避免与原始请求冲突
- TokenInterceptor: onRequest 和 onError 的 async lambda 全部
  包裹 try/catch,异常时 fallback 到 handler.next()
- main.dart: 三层全局错误兜底 —
  1) FlutterError.onError 捕获框架错误
  2) PlatformDispatcher.onError 捕获平台通道错误
  3) runZonedGuarded 捕获所有漏网的异步异常
- receiveTimeout/sendTimeout 不再触发重试(服务器已收到请求)
- 超时调整: connect 10s, send 30s, receive 30s
- 仪表盘卡片 IntrinsicHeight 等高对齐

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 04:37:39 -08:00
hailin 4e55e9a616 feat: 补齐大厂级网络层 — 401并发锁、请求去重、结构化错误日志
## 1. TokenRefreshLock(401 并发刷新竞态修复)
- 新增 `core/network/token_refresh_lock.dart`
- 使用 Completer 实现互斥锁:多个请求同时 401 时,
  仅第一个触发 refreshToken(),其余等待同一结果
- 防止 5 个页面同时 401 → 5 次 refresh → 4 次失败踢出用户

## 2. DedupInterceptor(请求去重)
- 新增 `core/network/dedup_interceptor.dart`
- 相同 GET URL 在飞行中时,后续请求复用第一个的结果
- 防止:用户快速点重试、页面切换重复加载、下拉刷新连点
- 仅限 GET,POST/PUT/DELETE 等写操作始终放行

## 3. ErrorLogInterceptor + ErrorLogger(结构化错误日志)
- 新增 `core/network/error_log_interceptor.dart` — Dio 拦截器
- 新增 `core/services/error_logger.dart` — 持久化日志服务
- 每个失败请求记录:时间戳、方法、URL、状态码、错误类型、重试次数
- 本地 SharedPreferences 存储最近 50 条,支持 summary 统计
- debug 模式同步 debugPrint 输出
- 预留 Sentry/Crashlytics flush 接口

## 4. Dio 拦截器管线优化
拦截器顺序调整为大厂标准管线:
1. DedupInterceptor — 去重(最先,防止重复请求进入管线)
2. TokenInterceptor — 注入 token + 401 刷新(带并发锁)
3. TenantInterceptor — 注入 X-Tenant-Id
4. RetryInterceptor — 指数退避重试
5. ErrorLogInterceptor — 错误日志(最后,记录最终失败)

移除 LogInterceptor(被 ErrorLogInterceptor 替代,且不再在
release 模式下打印请求 body 造成性能损耗)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 04:05:53 -08:00
hailin 94652857cd feat: 生产级 API 错误处理 — 重试拦截器、友好错误提示、网络监测、WebSocket 退避
## 问题
用户看到原始 DioException 堆栈(如 "DioException [unknown]: null Error:
HttpException: Connection reset by peer"),且无重试机制,网络抖动即报错。

## 变更

### 1. RetryInterceptor(指数退避自动重试)
- 新增 `core/network/retry_interceptor.dart`
- 自动重试:连接超时、发送超时、Connection reset、502/503/504/429
- 指数退避(800ms → 1.6s → 3.2s)+ 随机抖动防雪崩
- 最多 3 次重试,非瞬态错误(401/403/404)不重试
- 集成到 dio_client,优化超时:connect 8s、send 15s、receive 20s

### 2. ErrorHandler 全面升级(友好中文错误提示)
- 重写 `core/errors/error_handler.dart`,新增 `friendlyMessage()` 静态方法
- 所有 DioExceptionType 映射为具体中文:
  - Connection reset → "连接被服务器重置,请稍后重试"
  - Connection refused → "服务器拒绝连接,请确认服务是否启动"
  - Timeout → "连接超时,服务器无响应"
  - 401 → "登录已过期,请重新登录"
  - 403/404/429/500/502/503 各有独立提示
- 新增 TimeoutFailure 类型
- 所有 Failure.message 默认中文

### 3. 网络连接监测 + 离线 Banner
- 新增 `core/network/connectivity_provider.dart` — 每30秒探测服务器可达性
- 新增 `core/widgets/offline_banner.dart` — 黄色警告横幅 "网络连接不可用"
- 集成到 ScaffoldWithNav,所有页面顶部自动显示离线状态

### 4. 统一错误展示(杜绝 e.toString())
- 新增 `core/widgets/error_view.dart` — 统一错误 UI(图标 + 友好文案 + 重试按钮)
- 替换 6 个页面的内联错误 Column 为 ErrorView:
  tasks_page / servers_page / alerts_page / approvals_page / standing_orders_page
- 替换 dashboard 的 3 处 _SummaryCardError(message: e.toString())
- 替换 4 个 provider 的 e.toString(): chat / auth / settings / approvals
- 全项目零 e.toString() 残留(仅剩 time.minute.toString() 时间格式化)

### 5. WebSocket 重连增强
- 指数退避(1s → 2s → 4s → ... → 60s 上限)+ 随机抖动
- 最多 10 次自动重连,超限后停止
- disconnect() 阻止自动重连
- 新增 reconnect() 手动重连方法

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 04:01:04 -08:00
hailin 9f44878fea fix: unify all pages to Chinese + fix bottom nav selected state
1. 所有页面英文文本统一替换为中文(仪表盘、对话、任务、告警、
   服务器、常驻指令、审批、终端、设置)
2. 底部导航栏添加 selectedIndex 追踪当前路由,点击后正确高亮
3. 导航图标添加 outlined/filled 选中态区分
4. 设置页重构为大厂风格(圆角图标分组 + 主题底部弹窗选择)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 01:59:09 -08:00
hailin d1993a1175 feat: add auto-login with token restore on app startup
App 启动时从 SecureStorage 读取已存储的 JWT,解析用户信息自动恢复登录状态,
无需每次重新输入密码。Token 过期则自动尝试 refresh,refresh 失败才跳转登录页。

- 新增 tryRestoreSession() 从 JWT payload 解码用户信息
- 新增 _isTokenExpired() 检查 token 是否过期(预留 60s 缓冲)
- refreshToken() 成功后恢复 AuthState + tenant 上下文
- 新增 /splash 启动页,尝试恢复后决定跳转 dashboard 或 login

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 01:47:20 -08:00
hailin 092a561867 feat: 完成 iAgent App 三大功能 + 修复租户上下文
## 功能一:设置页(完整实现)
- 新增浅色主题(lightTheme),支持深色/浅色/跟随系统三种模式
- app.dart 接入 themeMode 动态切换
- 设置页完整重写:个人信息编辑、修改密码、主题切换、通知开关
- 新增 settings_remote_datasource 对接后端 admin/settings API
- settings_providers 新增 AccountProfileNotifier 管理远程个人资料

## 功能二:语音通话(音频集成)
- 添加 flutter_sound 依赖,创建 PcmPlayer 流式 PCM 播放器
- agent_call_page 替换空壳:真实麦克风采集(record + GTCRN 降噪)
- 真实 PCM 16kHz 流式播放,基于 RMS 能量驱动波形动画
- 修复 WebSocket URL 路径:/ws/voice/ → /api/v1/voice/ws/
- voice_repository_impl 支持后端返回相对路径自动拼接

## 功能三:推送通知(WebSocket MVP)
- 添加 flutter_local_notifications + socket_io_client 依赖
- 创建 AppNotification 实体、NotificationService(Socket.IO 连接 comm-service)
- 通知 providers:列表管理 + 未读计数
- 登录后自动连接通知服务,登出断开
- 底部导航 Alerts 标签添加未读角标(Badge)
- AndroidManifest 添加 POST_NOTIFICATIONS 权限
- main.dart 初始化本地通知插件

## 修复:租户上下文未初始化(500错误)
- 根因:登录后未设置 currentTenantIdProvider,导致 X-Tenant-Id 头缺失
- Flutter 端:login() 成功后从 JWT 设置 tenantId,logout 时清除
- 后端:tenant-context.middleware 增加 JWT tenantId 回退逻辑
- AuthUser 模型新增 tenantId 字段解析

新增 5 个文件,修改 16 个文件,添加 3 个依赖包

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 01:10:52 -08:00
hailin 51430adafc fix: 修复登录问题 — 重置用户密码 + 配置生产环境API地址
问题排查过程:
  1. 用户 hailin@it0.com 存在于 public.users,is_active=true
  2. 直接调用 auth-service login 接口返回 401 Invalid credentials
  3. 确认是密码不匹配 — 将密码重置为 admin123 (与 admin 账号相同)
  4. 重置后登录成功,Kong Gateway 路由也正常

App配置修改:
  - development: 端口从 8000 改为 18000 (匹配 Kong 映射)
  - production: 指向服务器 http://154.84.135.121:18000
  - 默认使用 production 配置 (之前是 development)

登录凭据:
  - admin@it0.com / admin123 (管理员)
  - hailin@it0.com / admin123 (运维员,请登录后修改密码)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 16:56:38 -08:00
hailin a568558585 feat: replace speech_to_text with GTCRN ML noise reduction + backend STT
Replace traditional on-device speech_to_text with a modern pipeline:
- Record audio via `record` package with hardware noise suppression
- Apply GTCRN neural denoising (sherpa-onnx, ICASSP 2024, 48K params)
- Trim silence, POST to backend /voice/transcribe (faster-whisper)

Changes:
- Add /transcribe endpoint to voice-service for audio file upload
- Add SpeechEnhancer wrapper for sherpa-onnx GTCRN model (523KB)
- Rewrite chat_page.dart voice input: record → denoise → transcribe
- Keep NoiseReducer.trimSilence for silence removal only
- Upgrade record to v6.2.0, add sherpa_onnx, path_provider

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 07:59:15 -08:00
hailin 00f8801d51 Initial commit: IT0 AI-powered server cluster operations platform
Full-stack monorepo with DDD + Clean Architecture:
- Backend: 7 NestJS microservices + 5 shared libraries (TypeScript)
- Mobile: Flutter app with Riverpod (Dart)
- Web Admin: Next.js dashboard with Zustand + React Query
- Voice: Python voice service (STT/TTS/VAD)
- Infra: Docker Compose, K8s manifests, Turborepo build

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 22:54:37 -08:00