Commit Graph

25 Commits

Author SHA1 Message Date
hailin 3020ecc465 fix(dockerfile): add referral-service package.json COPY in production stage
Without this, pnpm install --prod in the production stage doesn't know
about referral-service dependencies (@nestjs/core etc.) and they are missing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 21:37:23 -08:00
hailin 18049c47a3 fix(dockerfile): add referral-service package.json COPY step for pnpm install cache
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 21:26:01 -08:00
hailin 432cdc46a8 fix(dockerfile): use pnpm exec prisma generate in production stage
pnpm does not hoist workspace package binaries to /app/node_modules/.bin;
each package's .bin/ is only available within that package's node_modules.
Use 'pnpm exec prisma generate' from the service directory so pnpm can
resolve the prisma binary from the local node_modules/.bin symlink.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 20:16:19 -08:00
hailin 7edbea6ff0 fix(dockerfile): correct prisma generate path + add openssl for Alpine detection
Two fixes for Prisma on Alpine Linux:
1. Use /app/node_modules/.bin/prisma (workspace root) instead of
   node_modules/.bin/prisma — pnpm does not hoist binaries into each
   service's local node_modules/.bin, so the previous command silently
   skipped via || true, leaving only the default linux-musl (libssl 1.1) binary.
2. Add openssl to apk packages so Prisma can run 'openssl version' at
   runtime to detect OpenSSL 3.x and load the linux-musl-openssl-3.0.x
   engine binary instead of defaulting to the missing libssl.so.1.1 variant.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 20:10:07 -08:00
hailin ddf221cece fix(presence-service): generate prisma client in docker production stage
- Move prisma from devDependencies to dependencies so it is available
  after pnpm install --prod in the Dockerfile production stage
- Replace failed COPY of /app/node_modules/.prisma (pnpm virtual store
  path differs) with: COPY schema.prisma + RUN prisma generate in stage-1
- Only runs if schema.prisma exists (safe for all other services)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 18:10:43 -08:00
hailin 2ac60094bd fix(dockerfile): copy prisma generated client from builder to production stage
@prisma/client requires files generated by 'prisma generate' (.prisma/client/).
pnpm install --prod skips build scripts so the generated client is missing
in the production stage. Copy /app/node_modules/.prisma from builder to fix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 18:08:29 -08:00
hailin 8d2fd3335a feat(telemetry): add presence-service + Flutter telemetry module
## Backend — packages/services/presence-service (新微服务)

完整的 DDD + Clean Architecture 实现,移植自 RWADurian presence-service,
针对 IT0 架构做了以下适配:

### 核心功能
- 心跳接口: POST /api/v1/presence/heartbeat(JWT 验证,60s 间隔)
  → Redis Sorted Set `presence:online_users` 记录在线时间戳
  → 默认 5 分钟窗口判断在线(PRESENCE_WINDOW_SECONDS=300)
- 事件上报: POST /api/v1/analytics/events(批量,最多 50 条)
  → 写入 presence_event_log 表 + 更新 presence_device_profile
  → Redis HyperLogLog `presence:dau:{date}` 实时 DAU 估算
- 查询接口(需 AdminGuard):
  - GET /api/v1/analytics/online-count  — 实时在线人数
  - GET /api/v1/analytics/online-history — 历史在线快照
  - GET /api/v1/analytics/dau — DAU 统计

### IT0 适配要点
- JWT payload: `sub` = UUID userId(非 RWADurian 的 userSerialNum)
  → JwtAuthGuard: request.user = { userId: payload.sub, roles, tenantId }
- AdminGuard: 改为检查 `roles.includes('admin')`(非 type==='admin')
- 移除 Kafka EventPublisherService(IT0 无 Kafka)
- 移除 Prometheus MetricsService(IT0 无 Prometheus)
- 表前缀改为 `presence_`(避免与其他服务冲突)
- userId 字段 VarChar(36)(UUID 格式,非原来的 VarChar(20))
- Redis DB=10 隔离(独立 key 空间)

### 数据库表(public schema)
- presence_event_log       — 事件流水(append-only)
- presence_device_profile  — 设备快照(upsert,每台设备一行)
- presence_daily_active_users — DAU 日统计
- presence_online_snapshots   — 在线人数每分钟快照

### 定时任务(@nestjs/schedule)
- 每分钟: 采集在线人数快照 → presence_online_snapshots
- 每天 01:05 (UTC+8): 计算前一天 DAU → presence_daily_active_users

---

## Flutter — it0_app/lib/core/telemetry (新模块)

### 文件结构
- telemetry_service.dart      — 单例入口,统筹所有组件
- models/telemetry_event.dart — 事件模型,toServerJson() 将设备字段提升为顶层列
- models/device_context.dart  — 设备上下文(Android/iOS 信息)
- models/telemetry_config.dart — 远程配置(采样率/开关,支持远端同步)
- collectors/device_info_collector.dart — 采集 device_info_plus 设备信息
- storage/telemetry_storage.dart  — SharedPreferences 队列(最多 500 条)
- uploader/telemetry_uploader.dart — 批量上传到 /api/v1/analytics/events
- session/session_manager.dart    — WidgetsBindingObserver 监听前后台切换
- session/session_events.dart     — 会话事件常量
- presence/heartbeat_service.dart — 定时心跳 POST /api/v1/presence/heartbeat
- presence/presence_config.dart   — 心跳配置(间隔/requiresAuth)
- telemetry.dart                  — barrel 导出

### 集成点
- app_router.dart _tryRestore(): TelemetryService().initialize() 在 auth 之前
- auth_provider.dart login/loginWithOtp: setUserId + setAccessToken + resumeAfterLogin
- auth_provider.dart tryRestoreSession: 恢复 userId + accessToken
- auth_provider.dart logout: pauseForLogout + clearUserId + clearAccessToken

### 新增依赖
- device_info_plus: ^10.1.0
- equatable: ^2.0.5

---

## 基础设施

### Dockerfile.service
- 在 builder 和 production 阶段均添加 presence-service/package.json 的 COPY

### docker-compose.yml
- 新增 presence-service 容器(端口 3011/13011)
  - DATABASE_URL: postgresql://... (Prisma 所需连接串格式)
  - REDIS_HOST/PORT/DB: 10(presence 独立 Redis DB)
  - APP_PORT=3011, JWT_SECRET, PRESENCE_WINDOW_SECONDS=300
- api-gateway depends_on 新增 presence-service

### kong.yml (dbless 声明式)
- 新增 presence-service 服务(http://presence-service:3011)
  - presence-routes: /api/v1/presence
  - analytics-routes: /api/v1/analytics
- 对整个 presence-service 启用 JWT 插件(Kong 层鉴权)

### DB 迁移
- packages/shared/database/src/migrations/010-create-presence-tables.sql
  — 4 张 presence_ 前缀表 + 完整索引(IF NOT EXISTS 幂等)
- run-migrations.ts: runSharedSchema() 中新增执行 010-create-presence-tables.sql

---

## 部署步骤(服务器)

1. git pull
2. 执行 presence 表迁移(首次):
   docker exec it0-postgres psql -U it0 -d it0 \
     -f /path/to/010-create-presence-tables.sql
   或通过 migration runner:
   cd /home/ceshi/it0 && node packages/shared/database/dist/run-migrations.js
3. 重建并启动 presence-service:
   docker compose build presence-service api-gateway
   docker compose up -d presence-service api-gateway

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 17:44:01 -08:00
hailin c7f3807148 fix(billing-service): add to Dockerfile.service and update pnpm lockfile
- Dockerfile.service: add COPY lines for billing-service/package.json in
  both build and production stages so pnpm install includes its deps
  (omission caused 'node_modules missing' turbo build error)
- pnpm-lock.yaml: regenerated after running pnpm install to include all
  billing-service dependencies (stripe, alipay-sdk, wechat-pay-v3, etc.)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 21:27:35 -08:00
hailin d5df46c2d6 fix: add /data/versions directory creation in Dockerfile
Ensure /data/versions/android and /data/versions/ios directories are
created with correct appuser ownership during image build, fixing
EACCES permission error when version-service starts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 08:27:53 -08:00
hailin f6dffe02c5 feat: add version-service for IT0 App version management
New NestJS microservice (port 3009) providing complete version management
API for IT0 App, designed to integrate with the existing mobile-upgrade
frontend (update.szaiai.com).

Backend — packages/services/version-service/ (9 new files):
- AppVersion entity: platform (ANDROID/IOS), versionName, buildNumber,
  changelog, downloadUrl, fileSize, isForceUpdate, isEnabled, minOsVersion
- REST controller with 8 endpoints:
  GET/POST /api/v1/versions — list (with platform/disabled filters) & create
  GET/PUT/DELETE /api/v1/versions/:id — single CRUD
  PATCH /api/v1/versions/:id/toggle — enable/disable
  POST /api/v1/versions/upload — multipart APK/IPA upload (500MB limit)
  POST /api/v1/versions/parse — extract version info from APK/IPA
- File storage: /data/versions/{platform}/ via Docker volume
- APK/IPA parsing: app-info-parser package
- Database: public.app_versions table (non-tenant, platform-level)
- No JWT auth (internal version management, consistent with existing apps)

Infrastructure changes:
- Dockerfile.service: added version-service package.json COPY lines
- docker-compose.yml: version-service container (13009:3009), version_data
  volume, api-gateway depends_on
- kong.yml: version-service route (/api/v1/versions), CORS origin for
  update.szaiai.com (mobile-upgrade frontend domain)

Deployment note: nginx needs /downloads/versions/ location + client_max_body_size 500m

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 07:48:31 -08:00
hailin 3ed20cdf08 refactor: clean up agent SSH setup after fixing host-local routing
- Remove iproute2/NET_ADMIN (no longer needed)
- Remove ip route hack from entrypoint.sh
- rwa-colocation-2 server record updated to use Docker gateway IP
  since 14.215.128.96 is a host-local NIC on the IT0 server

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 18:11:44 -08:00
hailin 836d4d2a03 fix: add iproute2 to container for ip route command
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 18:06:35 -08:00
hailin f0ad6e09e6 fix: move entrypoint.sh to project root (deploy/ is in .dockerignore)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 12:14:31 -08:00
hailin bad7f4802d fix: use root entrypoint to copy SSH key then drop to appuser
The bind-mounted SSH key is owned by host uid (1000/node) but the
service runs as appuser (uid 1001). Use su-exec in entrypoint.sh
to copy the key as root, fix ownership, then drop privileges.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 12:13:55 -08:00
hailin 329916e1f6 fix: correct SSH key permissions in agent-service container
Mount host key to /tmp/host-ssh-key (read-only), then copy to
appuser's .ssh directory with correct ownership at container start.
Fixes "Permission denied" due to uid mismatch on bind mount.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 12:00:02 -08:00
hailin 795e8a11c5 feat: enable SSH access from agent-service container
- Add openssh-client to Dockerfile.service (alpine)
- Create .ssh directory with correct permissions for appuser
- Mount host SSH key into agent-service container (read-only)

This allows the Agent SDK to SSH into managed servers using the Bash tool.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 11:55:54 -08:00
hailin e02b350043 fix: create /data/claude-tenants dir with appuser ownership in Dockerfile
Without this, the SDK engine fails to create tenant HOME directories
because the Docker volume mount point doesn't exist and appuser lacks
write permissions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 02:52:57 -08:00
hailin bed17f32f9 fix: install bash in Alpine container for Agent SDK shell access
The Claude Agent SDK Bash tool requires a POSIX shell. Alpine only has
busybox ash, causing "No suitable shell found" errors. Install bash
and set SHELL=/bin/bash.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 06:52:23 -08:00
hailin d4391eef97 fix: run services as non-root user for SDK bypassPermissions
SDK blocks bypassPermissions when running as root for security.
Add non-root 'appuser' to Dockerfile.service and update volume
mounts to use /home/appuser/.claude paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 06:41:10 -08:00
hailin 34caa25c71 fix: copy SQL migrations to service dist path for schema provisioning
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 03:21:32 -08:00
hailin 895b361bd8 fix: copy SQL migration templates to Docker dist for schema provisioning
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 03:19:19 -08:00
hailin b620898bc8 fix: revert to node:18 (cached), enable crypto via NODE_OPTIONS
Docker Hub is unreachable from server, so node:20 can't be pulled.
Reverting to node:18-alpine (already cached) and using
--experimental-global-webcrypto to enable globalThis.crypto.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 19:17:23 -08:00
hailin bbb288025a fix: upgrade to Node.js 20 for globalThis.crypto support
crypto.randomUUID() is used throughout services but crypto is not
a global in Node.js 18. Node.js 20 provides globalThis.crypto.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 19:15:36 -08:00
hailin 39718a9a09 fix: resolve runtime errors for NestJS, Kong, and voice-service
- Dockerfile.service: fix entry point path (dist/services/{name}/src/main)
  due to tsconfig paths widening rootDir during compilation
- Kong config: remove unsupported ws/wss protocols (WebSocket works
  automatically over http/https in Kong 3.7)
- voice-service: fix pipecat import path for v0.0.30 API
  (pipecat.transports.network.websocket_server with lowercase class names)
- voice-service: add openai dependency required by pipecat anthropic service

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 19:00:03 -08:00
hailin 9120f4927e fix: add Dockerfiles and fix docker-compose build configuration
- Add shared Dockerfile.service for all 7 NestJS microservices using
  multi-stage build with pnpm workspace support
- Add Dockerfile for web-admin (Next.js standalone output)
- Add .dockerignore files for root and web-admin
- Fix docker-compose.yml: use monorepo root as build context with
  SERVICE_NAME build arg instead of per-service Dockerfiles
- Fix postgres/redis missing network config (services couldn't reach them)
- Use .env variables for DB credentials instead of hardcoded values
- Add JWT_REFRESH_SECRET and REDIS_URL to services that were missing them
- Add DB init script volume mount for postgres
- Remove deprecated version: '3.8' from all compose files
- Add output: 'standalone' to next.config.js for optimized Docker builds

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 04:31:23 -08:00