Commit Graph

9 Commits

Author SHA1 Message Date
hailin 7b71a4f2fc fix: properly close WebSocket with subscription cancel + fire-and-forget
Root cause: IOWebSocketChannel.sink.close() can hang indefinitely
(dart-lang/web_socket_channel#185). Previous fix used unawaited close
but didn't cancel the stream subscription, so the old listener could
still push events to _messageController.

Fix: Extract _closeCurrentConnection() that:
1. Cancels StreamSubscription first (stops duplicate events immediately)
2. Fire-and-forget sink.close(goingAway) (frees underlying socket)

This follows the workaround recommended in the official issue tracker.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 03:45:43 -08:00
hailin 45eb6bc453 fix: use unawaited close to prevent WebSocket reconnect hang
The await on sink.close() blocks indefinitely when the server doesn't
respond to the close handshake. Use fire-and-forget with unawaited()
so the new connection can proceed immediately.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 03:41:13 -08:00
hailin 3185438f36 fix: close previous WebSocket before opening new connection
When sending a second message in the same session, the old WebSocket
connection was not closed, causing both connections to subscribe to the
same session room. This resulted in each text event being received twice,
producing garbled/duplicated output text.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 03:37:16 -08:00
hailin 5e31b15dcf fix: use IOWebSocketChannel for headers support
WebSocketChannel.connect does not accept headers parameter in
web_socket_channel 2.4.0. Use IOWebSocketChannel.connect instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 16:45:35 -08:00
hailin 803cea0fe4 fix: pass JWT token in WebSocket connection headers
WebSocket connections to /ws/agent were rejected by Kong (401)
because the Authorization header was not included. Now reads
access_token from secure storage and passes it in the WebSocket
upgrade request headers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 16:43:31 -08:00
hailin 666b173906 fix: 根治 Unhandled Exception — async void 拦截器 + 全局错误兜底
根本原因:Dio interceptor 的 onError/onRequest 签名是 void,
标 async 后变成 Future<void> 但没人 await,内部异常全部变成
Unhandled Exception 崩溃。

修复:
- RetryInterceptor: onError 改为同步调度,retry 逻辑移到独立
  _retry() 方法并用 try/catch 包裹全部路径
- DedupInterceptor: 防止 Completer 重复 complete,retry 请求
  跳过去重避免与原始请求冲突
- TokenInterceptor: onRequest 和 onError 的 async lambda 全部
  包裹 try/catch,异常时 fallback 到 handler.next()
- main.dart: 三层全局错误兜底 —
  1) FlutterError.onError 捕获框架错误
  2) PlatformDispatcher.onError 捕获平台通道错误
  3) runZonedGuarded 捕获所有漏网的异步异常
- receiveTimeout/sendTimeout 不再触发重试(服务器已收到请求)
- 超时调整: connect 10s, send 30s, receive 30s
- 仪表盘卡片 IntrinsicHeight 等高对齐

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 04:37:39 -08:00
hailin 4e55e9a616 feat: 补齐大厂级网络层 — 401并发锁、请求去重、结构化错误日志
## 1. TokenRefreshLock(401 并发刷新竞态修复)
- 新增 `core/network/token_refresh_lock.dart`
- 使用 Completer 实现互斥锁:多个请求同时 401 时,
  仅第一个触发 refreshToken(),其余等待同一结果
- 防止 5 个页面同时 401 → 5 次 refresh → 4 次失败踢出用户

## 2. DedupInterceptor(请求去重)
- 新增 `core/network/dedup_interceptor.dart`
- 相同 GET URL 在飞行中时,后续请求复用第一个的结果
- 防止:用户快速点重试、页面切换重复加载、下拉刷新连点
- 仅限 GET,POST/PUT/DELETE 等写操作始终放行

## 3. ErrorLogInterceptor + ErrorLogger(结构化错误日志)
- 新增 `core/network/error_log_interceptor.dart` — Dio 拦截器
- 新增 `core/services/error_logger.dart` — 持久化日志服务
- 每个失败请求记录:时间戳、方法、URL、状态码、错误类型、重试次数
- 本地 SharedPreferences 存储最近 50 条,支持 summary 统计
- debug 模式同步 debugPrint 输出
- 预留 Sentry/Crashlytics flush 接口

## 4. Dio 拦截器管线优化
拦截器顺序调整为大厂标准管线:
1. DedupInterceptor — 去重(最先,防止重复请求进入管线)
2. TokenInterceptor — 注入 token + 401 刷新(带并发锁)
3. TenantInterceptor — 注入 X-Tenant-Id
4. RetryInterceptor — 指数退避重试
5. ErrorLogInterceptor — 错误日志(最后,记录最终失败)

移除 LogInterceptor(被 ErrorLogInterceptor 替代,且不再在
release 模式下打印请求 body 造成性能损耗)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 04:05:53 -08:00
hailin 94652857cd feat: 生产级 API 错误处理 — 重试拦截器、友好错误提示、网络监测、WebSocket 退避
## 问题
用户看到原始 DioException 堆栈(如 "DioException [unknown]: null Error:
HttpException: Connection reset by peer"),且无重试机制,网络抖动即报错。

## 变更

### 1. RetryInterceptor(指数退避自动重试)
- 新增 `core/network/retry_interceptor.dart`
- 自动重试:连接超时、发送超时、Connection reset、502/503/504/429
- 指数退避(800ms → 1.6s → 3.2s)+ 随机抖动防雪崩
- 最多 3 次重试,非瞬态错误(401/403/404)不重试
- 集成到 dio_client,优化超时:connect 8s、send 15s、receive 20s

### 2. ErrorHandler 全面升级(友好中文错误提示)
- 重写 `core/errors/error_handler.dart`,新增 `friendlyMessage()` 静态方法
- 所有 DioExceptionType 映射为具体中文:
  - Connection reset → "连接被服务器重置,请稍后重试"
  - Connection refused → "服务器拒绝连接,请确认服务是否启动"
  - Timeout → "连接超时,服务器无响应"
  - 401 → "登录已过期,请重新登录"
  - 403/404/429/500/502/503 各有独立提示
- 新增 TimeoutFailure 类型
- 所有 Failure.message 默认中文

### 3. 网络连接监测 + 离线 Banner
- 新增 `core/network/connectivity_provider.dart` — 每30秒探测服务器可达性
- 新增 `core/widgets/offline_banner.dart` — 黄色警告横幅 "网络连接不可用"
- 集成到 ScaffoldWithNav,所有页面顶部自动显示离线状态

### 4. 统一错误展示(杜绝 e.toString())
- 新增 `core/widgets/error_view.dart` — 统一错误 UI(图标 + 友好文案 + 重试按钮)
- 替换 6 个页面的内联错误 Column 为 ErrorView:
  tasks_page / servers_page / alerts_page / approvals_page / standing_orders_page
- 替换 dashboard 的 3 处 _SummaryCardError(message: e.toString())
- 替换 4 个 provider 的 e.toString(): chat / auth / settings / approvals
- 全项目零 e.toString() 残留(仅剩 time.minute.toString() 时间格式化)

### 5. WebSocket 重连增强
- 指数退避(1s → 2s → 4s → ... → 60s 上限)+ 随机抖动
- 最多 10 次自动重连,超限后停止
- disconnect() 阻止自动重连
- 新增 reconnect() 手动重连方法

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 04:01:04 -08:00
hailin 00f8801d51 Initial commit: IT0 AI-powered server cluster operations platform
Full-stack monorepo with DDD + Clean Architecture:
- Backend: 7 NestJS microservices + 5 shared libraries (TypeScript)
- Mobile: Flutter app with Riverpod (Dart)
- Web Admin: Next.js dashboard with Zustand + React Query
- Voice: Python voice service (STT/TTS/VAD)
- Infra: Docker Compose, K8s manifests, Turborepo build

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 22:54:37 -08:00