Commit Graph

54 Commits

Author SHA1 Message Date
hailin f305a8cd97 feat(session): broadcast participant_joined event via gRPC for real-time UI updates
Backend changes (session-coordinator):
- Add PublishParticipantJoined method to JoinSessionMessageRouterClient interface
- Implement PublishParticipantJoined in MessageRouterClient to broadcast events
- Call PublishParticipantJoined in join_session.go after participant joins
- Add detailed logging for debugging event broadcast

Android changes (service-party-android):
- Add detailed logging in TssRepository for session event handling
- Add detailed logging in MainViewModel for participant_joined processing
- Log activeSession state, event matching, and participant updates

This enables the initiator's waiting screen to receive real-time updates
when participants join the session, matching the expected behavior.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 08:34:47 -08:00
hailin 136a5ba851 fix(android): change address format from Cosmos to EVM and fix balance query
Changes:
- Change address derivation from deriveKavaAddress to deriveEvmAddress
  in TssRepository.kt (3 locations)
- Add AddressUtils.isEvmAddress() and getEvmAddress() helper methods
  to handle both old Cosmos and new EVM address formats
- Fix balance query for old wallets by deriving EVM address from
  public key when needed (MainViewModel.fetchBalanceForShare)
- Add retry logic for optimistic lock conflicts in join_session.go
  to prevent party_index collision during concurrent joins

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 07:48:52 -08:00
hailin 1708a03aaf fix(session): distinguish keygen vs sign in CanStart() and AllPartiesReady()
- Keygen/co-keygen: must have exactly N participants joined
- Sign (co-sign/persistent): only check all registered participants joined

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 08:34:40 -08:00
hailin 75b15acda2 docs: add BREAKING CHANGE warnings for co-sign modifications
Add detailed comments to warn about changes that affect persistent sign flow:
- session_coordinator.go: ValidateSessionCreation now allows T <= count <= N for sign
- mpc_session.go: CanStart/AllPartiesReady now check registered participants, not N
- session_coordinator_client.go: ThresholdN now uses keygenThresholdN instead of len(parties)

Each comment includes:
- Original code behavior
- New code behavior
- How to revert if persistent sign breaks
- Related files list

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 06:23:39 -08:00
hailin 94ab63db30 fix(co-sign): allow T to N participants for sign sessions
- Change ValidateSessionCreation to accept T <= participantCount <= N for sign sessions
- Co-managed sign uses exactly T parties
- Persistent sign uses T+1 parties
- Both now pass validation with correct keygenThresholdN

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 06:19:57 -08:00
hailin 99fa003b12 fix(co-sign): fix session start logic to check all registered participants
- CanStart(): Check if all registered participants have joined, not based on T/N
- AddParticipant(): Keep N as max limit (API handles T vs T+1 validation)
- AllPartiesReady(): Check all registered participants, not based on T/N
- This approach works for both co-managed (T parties) and persistent (T+1 parties) signing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 06:09:14 -08:00
hailin a09e163704 fix(co-sign): fix CanStart() to check T parties for sign sessions
- For keygen sessions: require all N parties to join before starting
- For sign sessions: require only T parties to join before starting
- This fixes session_started event not being triggered for signing sessions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 06:01:40 -08:00
hailin 2a95dd107f fix(co-sign): allow signing sessions with t participants instead of n
- Modify ValidateSessionCreation to differentiate between keygen and sign sessions
- For keygen: require participantCount == threshold.N() (all parties must participate)
- For sign: require participantCount == threshold.T() (only t parties needed)
- This fixes "session is full" error when creating signing session with 3 parties but n=5

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 05:45:05 -08:00
hailin e114723ab0 feat(mpc-system): add server-party-co-managed for co_managed_keygen sessions
- Create new server-party-co-managed service with two-phase event handling
  - Phase 1 (session_created): Store join token and wait
  - Phase 2 (session_started): Execute TSS protocol (same timing as service-party-app)
- Add PartyRoleCoManagedPersistent role to isolate from normal keygen/sign
- Update docker-compose.yml with 3 co-managed party instances
- Update deploy.sh service lists
- Modify selectPartiesByCompositionForCoManaged to use new role

This ensures co_managed_keygen sessions use dedicated parties that behave
100% compatible with service-party-app, without affecting existing keygen/sign flows.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 23:54:45 -08:00
hailin 6de545fcb9 fix(session-coordinator): generate wildcard token for co_managed_keygen external participants 2025-12-29 13:35:05 -08:00
hailin 1b48c05aa7 fix(mpc-system): GetSessionStatus 返回实际的 threshold_n 和 threshold_t
问题:
- Message Router 的 GetSessionStatus 把 TotalParties 当作 ThresholdN 返回
- 导致 server-party 收到错误的 threshold_n=2 而不是 3
- TSS 协议无法正确启动(参与者数量验证失败)

修复:
- 在 session_coordinator.proto 添加 threshold_n 和 threshold_t 字段
- Session Coordinator 返回实际的 threshold 值
- Message Router 透传 threshold 值而不是参与者数量

Generated with Claude Code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 11:59:53 -08:00
hailin a22fc16313 fix(session-coordinator): 修复 FindExpired SQL 时区问题
- expires_at 存储为 UTC 时间
- 查询时使用 NOW() AT TIME ZONE 'UTC' 确保时区一致
- 避免因时区差异导致 session 过早被标记为过期

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 10:07:28 -08:00
hailin 9bc48d19a9 fix(mpc-system): 修复 co_managed_keygen 参与者 party_index 映射问题
- 在 proto 中添加 ParticipantStatus 消息和 participants 字段
- session-coordinator 返回参与者详细信息(含 party_index)
- account 服务透传 participants 到 HTTP 响应
- service-party-app 使用服务器返回的 party_index 而非数组索引
- 同时返回 join_tokens map 和 join_token 字符串以兼容两种格式

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 09:06:13 -08:00
hailin df8a14211e debug(mpc-system): 添加 joinToken 调试日志
- service-party-app: validateInviteCode 记录 token 长度
- service-party-app: joinSession 记录 token 信息
- service-party-app: 修复 ValidateInviteCodeResult 类型缺少 joinToken 字段
- session-coordinator: JoinSession 记录 token 解析详情

用于调试 "invalid token" 错误

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 05:55:46 -08:00
hailin 5f4c7c135f feat(mpc-system): 完善 co_managed_keygen 流程并添加调试控制台
主要改动:
- service-party-app: 发起方创建会话后自动加入并设置 activeKeygenSession
- service-party-app: 添加轮询机制确保 100% 可靠触发 keygen
- service-party-app: 添加 DebugConsole 组件 (Ctrl+Shift+D 打开)
- service-party-app: 主进程添加 debugLog 系统,日志可实时显示到前端
- session-coordinator: JoinSession 加入 messageRouterClient 发布事件
- session-coordinator: 添加 PublishSessionStarted 方法

修复:
- 发起方不设置 activeKeygenSession 导致无法触发 keygen 的问题
- 加入方可能错过 session_started 事件的时序问题

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 05:32:40 -08:00
hailin a5ab2e8350 fix(session-coordinator): 支持 co_managed_keygen 动态参与者加入
问题: 通过邀请码加入共管钱包会话时报 "party not invited" 错误
原因: 外部参与者不在 party pool 中,CreateSession 时无法预先选择

修复:
- join_session.go: 对于 co_managed_keygen + wildcard token,允许动态添加参与者
- create_session.go: 新增 selectPartiesByCompositionForCoManaged,跳过 TemporaryCount 选择
- report_completion.go: 使用 IsKeygen() 方法,co_managed_keygen 完成后也创建账户记录

注意: 所有修改仅对 co_managed_keygen 类型生效,不影响现有 keygen/sign 流程

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 04:25:11 -08:00
hailin 21985abde5 fix(session-coordinator): 保存 WalletName 和 InviteCode 到数据库
- CreateSessionInput 添加 WalletName 和 InviteCode 字段
- gRPC handler 从请求中读取并传递这些字段
- CreateSession use case 在创建会话时设置这些字段

修复: 通过邀请码查询会话时找不到记录的问题

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 03:07:44 -08:00
hailin 1b5bcf3fda fix(co-managed-wallet): 修复向后兼容性问题并完善protobuf定义
## 变更概述
根据用户反馈,将 Session Coordinator 的函数签名改为可选参数模式,
确保新功能 100% 不影响现有的 keygen/sign 功能。

## 主要变更

### 1. Session Coordinator 向后兼容修复
- 保留原有 `ReconstructSession` 函数签名不变
- 新增 `ReconstructSessionOptions` 结构体存放可选参数
- 新增 `ReconstructSessionWithOptions` 函数支持新字段
- 原函数内部调用新函数,传入 nil options

### 2. Protobuf 定义更新
- CreateSessionRequest 新增字段:
  - wallet_name (field 10): 钱包名称
  - invite_code (field 11): 邀请码
- SessionInfo 新增字段:
  - wallet_name (field 8): 钱包名称
  - invite_code (field 9): 邀请码
- session_type 支持 "co_managed_keygen"

### 3. TSS Party 子进程修复
- 修复 tss.NewPartyID 参数类型错误 (big.Int)
- 修复 go.mod 依赖问题 (ed25519 replace)
- 删除未使用的变量

### 4. 清理错误生成的文件
- 删除 api/proto/*.pb.go (错误位置)
- 保留 api/grpc/coordinator/v1/*.pb.go (正确位置)

## 修改的文件

| 文件 | 变更类型 | 说明 |
|------|---------|------|
| mpc_session.go | 修改 | 添加 ReconstructSessionWithOptions |
| session_postgres_repo.go | 修改 | 使用新函数传入 options |
| session_cache_adapter.go | 修改 | 使用新函数传入 options |
| session_coordinator.proto | 修改 | 添加 wallet_name, invite_code 字段 |
| session_coordinator.pb.go | 重新生成 | 包含新 protobuf 字段 |
| tss-party/main.go | 修复 | NewPartyID 参数和未使用变量 |
| tss-party/go.mod | 修复 | ed25519 依赖替换 |

## 向后兼容性保证

- 所有现有代码调用 ReconstructSession 无需任何修改
- 数据库使用 COALESCE 处理 NULL 值
- Protobuf 新字段使用高序号,不影响现有消息解析
- **影响现有功能的风险: 0%**

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 23:33:40 -08:00
hailin fea01642e7 feat(co-managed-wallet): 添加分布式多方共管钱包创建功能
## 功能概述
实现分布式多方共管钱包创建功能,包括 Admin-Web 扩展和 Service-Party 桌面应用。

## 主要变更

### 1. Admin-Web 扩展 (前端)
- 新增 CoManagedWalletSection 组件 (frontend/admin-web/src/components/features/co-managed-wallet/)
- 在授权管理页面添加共管钱包入口卡片
- 实现创建钱包向导: 配置 → 邀请 → 生成 → 完成
- 包含组件: ThresholdConfig, InviteQRCode, ParticipantList, SessionProgress, WalletResult

### 2. Admin-Service 后端 API
- 新增共管钱包领域实体和枚举 (domain/entities/co-managed-wallet.entity.ts)
- 新增 REST 控制器 (api/controllers/co-managed-wallet.controller.ts)
- 新增服务层 (application/services/co-managed-wallet.service.ts)
- 新增 Prisma 模型: CoManagedWalletSession, CoManagedWallet
- 更新 app.module.ts 注册新模块

### 3. Session Coordinator 扩展 (Go)
- 新增会话类型: SessionTypeCoManagedKeygen ("co_managed_keygen")
- 扩展 MPCSession 实体添加 WalletName 和 InviteCode 字段
- 更新 PostgreSQL 和 Redis 适配器支持新字段
- 新增数据库迁移: 008_add_co_managed_wallet_fields

### 4. Service-Party 桌面应用 (新项目)
- 位置: backend/mpc-system/services/service-party-app/
- 技术栈: Electron + React + TypeScript + Vite
- 包含模块:
  - gRPC 客户端 (连接 Message Router)
  - TSS 处理器 (子进程方式运行 Go TSS 协议)
  - 本地加密存储 (AES-256-GCM)
- 页面: Home, Join, Create, Session, Settings

## 修改的现有文件 (便于回滚)

1. backend/mpc-system/services/session-coordinator/domain/entities/mpc_session.go
   - 添加 SessionTypeCoManagedKeygen 常量
   - 添加 IsKeygen() 方法
   - 添加 WalletName, InviteCode 字段
   - 更新 ReconstructSession, ToDTO, SessionDTO

2. backend/mpc-system/services/session-coordinator/adapters/output/postgres/session_postgres_repo.go
   - 更新 SQL 查询包含 wallet_name, invite_code
   - 更新 Save, FindByUUID, FindByStatus 等方法
   - 更新 scanSessions, sessionRow

3. backend/mpc-system/services/session-coordinator/adapters/output/redis/session_cache_adapter.go
   - 更新 sessionCacheEntry 结构
   - 更新 sessionToCacheEntry, cacheEntryToSession

4. backend/services/admin-service/prisma/schema.prisma
   - 新增 WalletSessionStatus 枚举
   - 新增 CoManagedWalletSession, CoManagedWallet 模型

5. backend/services/admin-service/src/app.module.ts
   - 导入并注册共管钱包相关组件

6. frontend/admin-web/src/app/(dashboard)/authorization/page.tsx
   - 导入并添加 CoManagedWalletSection

7. frontend/admin-web/src/infrastructure/api/endpoints.ts
   - 添加 CO_MANAGED_WALLETS API 端点

## 回滚说明

如需回滚此功能:
1. 回滚数据库迁移: 运行 008_add_co_managed_wallet_fields.down.sql
2. 删除新增文件夹:
   - backend/mpc-system/services/service-party-app/
   - frontend/admin-web/src/components/features/co-managed-wallet/
   - backend/services/admin-service/src/**/co-managed-wallet*
3. 恢复修改的文件到前一个版本
4. 运行 prisma generate 重新生成 Prisma 客户端

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 21:39:07 -08:00
hailin a01284678d feat(wallet/mpc): 增强提现和充值流程可靠性
## 主要改进

### MPC 签名系统 (mpc-system)
- 添加签名缓存机制,避免重复签名请求
- 修复 yParity 恢复逻辑,确保签名格式正确
- 优化签名完成报告流程

### 区块链服务 (blockchain-service)
- EIP-1559 降级为 Legacy 交易(KAVA 测试网兼容)
- 修复 gas 估算逻辑

### 钱包服务 (wallet-service)
- 添加乐观锁机制 (version 字段) 防止并发修改
- 提现确认流程添加事务保护 + 乐观锁
- 提现失败时正确解冻 amount + fee
- 充值流程添加事务保护 + 乐观锁
- Kafka consumer 添加错误重抛,触发重试机制

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 19:47:20 -08:00
hailin ac4d9283dc fix: preserve original PartyIndex from keygen for signing sessions
- Add PartyIndex field to protobuf ParticipantInfo message
- Pass original PartyIndex from account shares to session coordinator
- Use original PartyIndex instead of loop variable when creating participants
- This fixes TSS signing failures when non-consecutive parties are selected
2025-12-06 10:45:05 -08:00
hailin fd74bc825a chore: add detailed logging for keygen_session_id tracing
Add logging at key points to trace keygen_session_id flow:
- Account Handler: log keygen_session_id when creating signing session
- Session Coordinator: log keygen_session_id in CreateSession and JoinSession
- Message Router: log keygen_session_id when proxying JoinSession
- Server Party: log keygen_session_id when joining session

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 09:21:19 -08:00
hailin 3d176e1132 feat: complete keygen_session_id implementation for signing sessions
- Regenerate protobuf Go code with KeygenSessionId fields
- Session Coordinator correctly parses, stores, and returns keygen_session_id
- Message Router Client parses keygen_session_id in JoinSession response
- participate_signing.go uses keygen_session_id for precise share lookup
- Database schema already includes keygen_session_id column

This fixes the signing issue where wrong keyshares were loaded for multi-account scenarios.
2025-12-06 08:57:30 -08:00
hailin 23eff00d76 feat: add KeygenSessionID to MPCSession entity
- Add KeygenSessionID field to MPCSession struct for tracking which keygen's shares to use
- This is the first step in完整的修复流程
2025-12-06 08:40:38 -08:00
hailin 13e81e37c9 fix(db): update repository to save and load delegate_party_id field
Update session repository to properly handle delegate_party_id column:
- Add delegate_party_id to Save method INSERT and UPDATE statements
- Add DelegatePartyID field to sessionRow struct
- Update FindByUUID, FindByStatus, FindExpired, FindActive SELECT queries
- Update scanSessions method to scan and pass delegate_party_id
- Remove placeholder empty string, now loads actual value from database

This completes the delegate party functionality by ensuring the delegate party ID
is persisted and retrieved correctly from the database.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 05:52:34 -08:00
hailin 36e1359f43 fix(session-coordinator): pass PartyComposition from gRPC request to use case
Fixed critical bug where PartyComposition (persistent/delegate party counts) was being sent
by account-service in gRPC request but was not being extracted and passed to the CreateSession
use case, causing delegate party selection to fail.

Changes:
- Extract PartyComposition from protobuf request and pass to CreateSessionInput
- Add logging for party composition values in gRPC handler
- Return delegate_party_id and selected_parties in CreateSessionResponse
- Load session after creation to get delegate party ID

This fixes the issue where require_delegate=true had no effect and all parties selected
were persistent parties instead of 2 persistent + 1 delegate.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 05:38:38 -08:00
hailin 5f12404be4 fix: remove dynamic participant join to fix concurrent party_index assignment
- Remove dynamic participant addition in JoinSession
- Participants must be pre-created in CreateSession
- Add ErrPartyNotInvited error for unauthorized join attempts
- Fix Redis adapter to include version parameter in ReconstructSession
- This fixes VSS verification failures caused by inconsistent party indices
2025-12-06 04:54:40 -08:00
hailin b72268c1ce feat(mpc-system): implement optimistic locking for session updates
Implement version-based optimistic locking to prevent concurrent update conflicts
when multiple parties simultaneously report completion during keygen operations.

Changes:
- Add version column to mpc_sessions table (migration 004)
- Add Version field to MPCSession entity
- Define ErrOptimisticLockConflict error
- Update SessionPostgresRepo.Update() to check version and increment on success
- Add automatic retry logic (max 3 attempts) to ReportCompletionUseCase
- Update Save and all query methods (FindByStatus, FindExpired, etc.) to handle version field

This replaces pessimistic locking (FOR UPDATE) with optimistic locking using
the industry-standard pattern: WHERE version = $n and checking rowsAffected.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 04:16:32 -08:00
hailin bfe129da51 test(logger): add debug log test to verify debug level works
Added test debug log immediately after logger initialization.
If debug logging is working, we should see this message.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 03:23:24 -08:00
hailin 32e3970f34 debug: add logger level debug output 2025-12-06 02:56:26 -08:00
hailin fb9c85f883 debug(coordinator): add detailed logging to track concurrent update issue
Add comprehensive debug logs to:
1. report_completion.go - log all participant statuses at key points
2. session_postgres_repo.go - log before/after each participant update

This will help identify why server-party-1 status remains 'invited'
despite successfully reporting completion.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 02:11:28 -08:00
hailin 380bf46fb6 fix(coordinator): add row-level locking to prevent concurrent update conflicts
Problem:
Multiple parties reporting completion simultaneously caused lost updates
because each transaction would read the full session, modify their
participant status, then update ALL participants - causing last-write-wins
behavior.

Solution:
Add SELECT ... FOR UPDATE locks on both mpc_sessions and participants
tables at the start of the Update transaction. This serializes concurrent
updates and prevents lost updates.

Lock order:
1. Lock session row (FOR UPDATE)
2. Lock all participant rows for this session (FOR UPDATE)
3. Perform updates
4. Commit (releases locks)

This ensures that concurrent ReportCompletion calls are fully serialized
and each participant status update is preserved.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 01:58:05 -08:00
hailin aab88834f9 fix(coordinator): prevent lost updates in concurrent participant status changes
Fix critical concurrency bug where simultaneous ReportCompletion calls from
multiple parties could cause lost database updates. Changed from UPSERT-all
to UPDATE-individual pattern to ensure each participant status update is
atomic and won't be overwritten by concurrent transactions.

Before: All participants were UPSERTed in single transaction, causing
last-commit-wins behavior that lost earlier status updates.

After: Each participant is UPDATEd individually using UPDATE...WHERE, then
INSERT only if row doesn't exist. This prevents concurrent updates to
different participants from conflicting.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 01:35:38 -08:00
hailin 00b48bab50 fix(coordinator): handle all participant states in ReportCompletion with proper state transitions
- Add switch-case to handle Invited, Joined, and Ready states
- Auto-transition Invited -> Joined -> Ready -> Completed
- Auto-transition Joined -> Ready -> Completed
- Auto-transition Ready -> Completed
- Return error for invalid states (Failed, Completed, etc.)
- Fixes 'cannot transition to completed status' error
- Applies to all parties including server-party-api
2025-12-06 01:09:49 -08:00
hailin 4e14212147 fix(coordinator): auto-transition participant to Ready before Completed
ReportCompletion was failing with "cannot transition to completed status"
because participants were in Joined state trying to transition directly to
Completed, which violates the state machine flow: Joined -> Ready -> Completed.

Changes:
- Check participant status before marking as Completed
- Auto-transition Joined -> Ready if needed
- Then transition Ready -> Completed
- Add debug logging for auto-transition

This fixes the error seen during keygen completion.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 00:33:22 -08:00
hailin 78119bc6a4 fix(proto): add party_index to JoinSessionResponse for correct index assignment
The JoinSessionResponse from coordinator was missing party_index field,
causing message router to try finding self's index in OtherParties (which
only contains other parties). This resulted in incorrect party index
assignment leading to "duplicate indexes" error in TSS keygen.

Changes:
- Add party_index field to coordinator's JoinSessionResponse proto
- Coordinator now includes PartyIndex in gRPC response
- Message router uses party_index from coordinator instead of searching

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 00:08:47 -08:00
hailin aa74e2b2e2 feat(mpc-system): add signing parties configuration and delegate signing support
- Add signing-config API endpoints (POST/PUT/DELETE/GET) for configuring
  which parties should participate in signing operations
- Add SigningParties field to Account entity with database migration
- Modify CreateSigningSession to use configured parties if set,
  otherwise use all active parties (backward compatible)
- Add delegate party signing support: user provides encrypted share
  at sign time for delegate party to use
- Update protobuf definitions for DelegateUserShare in session events
- Add ShareTypeDelegate to support hybrid custody model

API endpoints:
- POST /accounts/:id/signing-config - Set signing parties (first time)
- PUT /accounts/:id/signing-config - Update signing parties
- DELETE /accounts/:id/signing-config - Clear config (use all parties)
- GET /accounts/:id/signing-config - Get current configuration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 22:47:55 -08:00
hailin 135e821386 feat(mpc-system): integrate reliability mechanisms and enable party-driven architecture
- Enable SubscribeSessionEvents for automatic session participation
- Integrate heartbeat mechanism with pending message count
- Add ACK sending after message receipt for reliable delivery
- Add party activity tracking in session coordinator
- Add CountPendingByParty for heartbeat response
- Add retry package with exponential backoff for gRPC clients
- Add memory-based message broker and event publisher adapters
- Add account service integration for keygen completion
- Add party timeout checking background job
- Add notification service stub for future implementation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 20:30:03 -08:00
hailin 34f0f7b897 chore(mpc-system): update Dockerfiles to Go 1.24 and fix line endings
- Update all Dockerfiles from Go 1.21 to Go 1.24 (required by go.mod)
- Fix line endings in deploy.sh and .env.example for Unix compatibility

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 16:40:32 -08:00
hailin c52b6aa980 refactor(mpc-system): replace K8s party discovery with Message Router-based discovery
- Add GetRegisteredParties gRPC method to Message Router for party discovery
- Create MessageRouterPartyDiscovery adapter in Session Coordinator
- Remove K8s dependency from Session Coordinator (works in any environment)
- Add party registration to server-party-api on startup
- Fix docker-compose.yml: add MESSAGE_ROUTER_ADDR to session-coordinator

This change implements a fully decentralized party discovery mechanism:
- Parties register themselves to Message Router on startup
- Session Coordinator queries Message Router for available parties
- Works in Docker Compose, K8s, or any deployment environment

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 16:37:16 -08:00
hailin c976fd3eb1 feat(mpc-system): implement party-driven architecture with SessionEvent broadcasting
Fully implemented party-driven architecture according to international standards (Fireblocks, ING Bank, ZenGo patterns):

**Architecture Changes:**
- Parties actively connect to Message Router (not passively called by coordinator)
- Session Coordinator publishes SessionEvents when creating sessions
- Parties automatically subscribe and respond to SessionEvents
- PartyID-based routing instead of network addresses

**New Features:**
1. Session Coordinator → Message Router gRPC Client
   - PublishSessionEvent RPC for broadcasting session lifecycle events
   - Automatic event publishing after session creation

2. Message Router SessionEvent Broadcasting
   - SubscribeSessionEvents RPC for party subscriptions
   - PublishSessionEvent RPC for coordinator publishing
   - Targeted broadcasting to selected parties

3. Server-Party Auto-Registration & Subscription
   - RegisterParty on startup with role (persistent/delegate/temporary)
   - SubscribeSessionEvents for automatic session notifications
   - Event handler for automatic MPC participation

**Files Modified:**
- api/proto/message_router.proto: Added SessionEvent messages and RPCs
- services/message-router/adapters/input/grpc/message_grpc_handler.go: PublishSessionEvent handler
- services/session-coordinator/adapters/output/grpc/message_router_client.go: NEW - gRPC client
- services/session-coordinator/application/use_cases/create_session.go: SessionEvent publishing
- services/session-coordinator/cmd/server/main.go: Message Router client initialization
- services/server-party/adapters/output/grpc/message_router_client.go: RegisterParty + SubscribeSessionEvents
- services/server-party/cmd/server/main.go: Party registration and event subscription (commented pending full integration)
- go.mod/go.sum: Updated grpc to v1.77.0

**Technical Details:**
- gRPC streaming for SessionEvent subscriptions
- Non-blocking channel broadcasts prevent slow subscribers from blocking
- PartyRole support (persistent/delegate/temporary)
- Join tokens distributed via SessionEvent

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 08:44:05 -08:00
hailin 747e4ae8ef refactor(mpc-system): migrate to party-driven architecture with PartyID-based routing
- Remove Address field from PartyEndpoint (parties connect to router themselves)
- Update K8s Discovery to only manage PartyID and Role labels
- Add Party registration and SessionEvent protobuf definitions
- Implement PartyRegistry and SessionEventBroadcaster domain logic
- Add RegisterParty and SubscribeSessionEvents gRPC handlers
- Prepare infrastructure for party-driven MPC coordination

This is the first phase of migrating from coordinator-driven to party-driven
architecture following international MPC system design patterns.
2025-12-05 08:11:28 -08:00
hailin e975e9d86c feat(mpc-system): implement party role labels with strict persistent-only default
Implement Solution 1 (Party Role Labels) to differentiate between persistent
and delegate parties, with strict security guarantees for MPC threshold systems.

Key Features:
- PartyRole enum: persistent, delegate, temporary
- K8s pod labels (party-role) for role identification
- Role-based party filtering and selection
- Strict persistent-only default policy (no fallback)
- Optional PartyComposition for custom party requirements

Security Guarantees:
- Default: MUST use persistent parties (store shares in database)
- Fail fast with clear error if insufficient persistent parties
- No silent fallback to mixed/delegate parties
- Empty PartyComposition validation prevents accidental bypass
- MPC system compatibility maintained

Implementation:
1. Added PartyRole type with persistent/delegate/temporary constants
2. Extended PartyEndpoint with Role field
3. K8s party discovery extracts role from pod labels (defaults to persistent)
4. Session creation logic with strict persistent requirement
5. PartyComposition support for explicit mixed-role sessions
6. K8s deployment files with party-role labels

Files Modified:
- services/session-coordinator/application/ports/output/party_pool_port.go
- services/session-coordinator/infrastructure/k8s/party_discovery.go
- services/session-coordinator/application/ports/input/session_management_port.go
- services/session-coordinator/application/use_cases/create_session.go
- k8s/server-party-deployment.yaml (persistent role)

Files Added:
- k8s/server-party-api-deployment.yaml (delegate role)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 07:08:59 -08:00
hailin cf534ec178 feat(mpc-system): implement Kubernetes-based dynamic party pool architecture
Major architectural refactoring to align with international MPC standards
and enable horizontal scalability.

## Core Changes

### 1. DeviceInfo Made Optional
- Modified DeviceInfo.Validate() to allow empty device information
- Aligns with international MPC protocol standards
- MPC protocol layer should not mandate device-specific metadata
- Location: services/session-coordinator/domain/entities/device_info.go

### 2. Kubernetes Party Discovery Service
- Created infrastructure/k8s/party_discovery.go (220 lines)
- Implements dynamic service discovery via Kubernetes API
- Supports in-cluster config and kubeconfig fallback
- Auto-refreshes party list every 30s (configurable)
- Health-aware selection (only ready pods)
- Uses pod names as unique party IDs

### 3. Party Pool Architecture
- Defined PartyPoolPort interface for abstraction
- CreateSessionUseCase now supports automatic party selection
- When no participants specified, selects from K8s pool
- Graceful fallback to dynamic join mode if discovery fails
- Location: services/session-coordinator/application/ports/output/party_pool_port.go

### 4. Integration Updates
- Modified CreateSessionUseCase to inject partyPool
- Updated session-coordinator main.go to initialize K8s discovery
- gRPC handler already supports optional participants
- Added k8s client-go dependencies (v0.29.0) to go.mod

## Kubernetes Deployment

### New K8s Manifests
- k8s/namespace.yaml: mpc-system namespace
- k8s/configmap.yaml: shared configuration
- k8s/secrets-example.yaml: secrets template
- k8s/server-party-deployment.yaml: scalable party pool (3+ replicas)
- k8s/session-coordinator-deployment.yaml: coordinator with RBAC
- k8s/README.md: comprehensive deployment guide

### RBAC Configuration
- ServiceAccount for session-coordinator
- Role with pods/services get/list/watch permissions
- RoleBinding to grant discovery capabilities

## Key Features

 Dynamic service discovery via Kubernetes API
 Horizontal scaling (kubectl scale deployment)
 No hardcoded party IDs
 Health-aware party selection
 Graceful degradation when K8s unavailable
 MPC protocol compliance (optional DeviceInfo)

## Deployment Modes

### Docker Compose (Existing)
- Fixed 3 parties (server-party-1/2/3)
- Quick setup for development
- Backward compatible

### Kubernetes (New)
- Dynamic party pool
- Auto-discovery and scaling
- Production-ready

## Documentation

- Updated main README.md with deployment options
- Added architecture diagram showing scalable party pool
- Created comprehensive k8s/README.md with:
  - Quick start guide
  - Scaling instructions
  - Troubleshooting section
  - RBAC configuration details

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 06:12:49 -08:00
hailin 553ffd365e feat(mpc-system): optimize party index handling and add gRPC debug logs
- Simplified participant list handling in JoinSession client
- Added debug logging for party_index conversion in gRPC messages
- Removed redundant party filtering logic
- Added detailed logging to trace protobuf field values

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 04:00:09 -08:00
hailin c9cb5676d0 debug: add logging for participant information in gRPC handlers
Added debug logging to track participant details including party_index in:
- account service MPC keygen handler
- session coordinator gRPC client
- session coordinator gRPC handler

This helps debug the party index assignment issue where all parties
were receiving index 0 instead of unique indices (0, 1, 2).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 03:11:18 -08:00
hailin b4d6b0f264 feat(mpc-system): add connection retry logic with exponential backoff
- Add retry mechanism for PostgreSQL connections (10 retries, 2s base delay)
- Add retry mechanism for RabbitMQ connections (10 retries, 2s base delay)
- Add retry mechanism for Redis connections (10 retries, 2s base delay)
- Use exponential backoff: delay increases with each retry attempt
- Log detailed retry information (attempt number, max retries, errors)
- Redis continues without cache if all retries fail (non-critical)
- Database and RabbitMQ return error after all retries (critical)

This resolves startup failures when dependent services are slow to initialize,
particularly RabbitMQ which may pass health checks but not be fully ready.
2025-12-04 23:12:15 -08:00
hailin 2556fea841 refactor: separate configuration from code following 12-Factor App principles
- Created .env.example files with comprehensive security warnings
- Removed hardcoded IP addresses and credentials from docker-compose files
- Made database passwords mandatory (fail-fast on missing config)
- Removed Chinese mirror sources from all Dockerfiles
- Enhanced deploy.sh scripts with .env validation and auto-creation
- Added comprehensive README.md deployment guides
- Changed ALLOWED_IPS default to enable cross-server deployment
- Updated all docker-compose files to use environment variables

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 21:46:35 -08:00
Developer a80e80f179 perf(mpc-system): 添加 Alpine 镜像加速配置
为所有 Dockerfile 的 builder 和 final 阶段添加阿里云镜像源:
- 使用 mirrors.aliyun.com 替代 dl-cdn.alpinelinux.org
- 显著加速中国区 Docker 构建中的 apk 包下载

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 17:38:43 -08:00
Developer 1700b8b57c fix(mpc-system): 使用 curl 进行健康检查
- 将 wget --spider (HEAD 请求) 改为 curl -sf (GET 请求)
- Gin 路由只响应 GET 请求,HEAD 请求返回 404
- 安装 curl 替代 wget

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 17:23:17 -08:00