rwadurian/backend/mpc-system/services
hailin 9f7a5cbb12 fix(android): 修复2-of-3签名session_started竞态条件导致的签名失败
## 问题描述 (Problem)
当用户勾选"包含服务器备份"发起2-of-3签名时,Android设备无法开始签名,
导致整个签名流程卡死。日志显示:
- 服务器成功参与并发送TSS消息 ✓
- Android收到session_started事件 ✓
- 但Android未执行startSigning() 

## 根本原因 (Root Cause)
典型的竞态条件:
1. Android调用createSignSessionWithOptions() API
2. 服务器立即在session_created阶段JoinSession
3. 两方都加入→session_started事件立即触发(12.383ms)
4. 但Android的result.fold回调还未完成(12.387ms才设置状态)
5. MainViewModel检查pendingSignInitiatorInfo发现为null,签名被跳过

时间窗口仅4ms,但CPU性能差异会导致100%失败率。

## 解决方案 (Solution)
采用架构级修复,参考server-party-co-managed的PendingSessionCache模式:

### 1. TssRepository层缓存机制 (Lines ~210-223)
```kotlin
// 在JoinSession成功后立即缓存签名信息
private data class PendingSignInfo(
    val sessionId: String,
    val shareId: Long,
    val password: String,
    val messageHash: String
)
private var pendingSignInfo: PendingSignInfo? = null
private var signingTriggered: Boolean = false
```

### 2. 事件到达时自动触发 (Lines ~273-320)
```kotlin
when (event.eventType) {
    "session_started" -> {
        // 检测到缓存的签名信息,自动触发
        if (pendingSignInfo != null && !signingTriggered) {
            signingTriggered = true
            repositoryScope.launch {
                startSigning(...)
                waitForSignature()
            }
        }
        // 仍然通知MainViewModel(作为兜底)
        sessionEventCallback?.invoke(event)
    }
}
```

### 3. MainViewModel防重入检查 (MainViewModel.kt ~1488)
```kotlin
private fun startSignAsInitiator(selectedParties: List<String>) {
    // 检查TssRepository是否已触发
    if (repository.isSigningTriggered()) {
        Log.d("MainViewModel", "Signing already triggered, skipping duplicate")
        return
    }
    startSigningProcess(...)
}
```

## 工作流程 (Workflow)
```
createSignSessionWithOptions()
    ↓
【改动】缓存pendingSignInfo (before any event)
    ↓
auto-join session
    ↓
════ 4ms竞态窗口 ════
    ↓
session_started arrives (12ms)
    ↓
【改动】TssRepository检测到缓存,自动触发签名 ✓
    ↓
【改动】设置signingTriggered=true防止重复
    ↓
MainViewModel.result.fold完成 (50ms)
    ↓
【改动】检测已触发,跳过重复执行 ✓
    ↓
签名成功完成
```

## 关键修改点 (Key Changes)

### TssRepository.kt
1. 添加PendingSignInfo缓存和signingTriggered标志(Line ~210-223)
2. createSignSessionWithOptions缓存签名信息(Line ~3950-3965)
3. session_started处理器自动触发签名(Line ~273-320)
4. 导出isSigningTriggered()供ViewModel检查(Line ~399-405)

### MainViewModel.kt
1. startSignAsInitiator添加防重入检查(Line ~1488-1495)

## 向后兼容性 (Backward Compatibility)
 100%向后兼容:
- 保留MainViewModel原有逻辑作为fallback
- 仅在includeServerBackup=true时设置缓存(其他流程不变)
- 添加防重入检查,不会影响正常签名
- 普通2方签名、3方签名等流程完全不受影响

## 验证日志 (Verification Logs)
修复后将输出:
```
[CO-SIGN-OPTIONS] Cached pendingSignInfo for sessionId=xxx
[RACE-FIX] session_started arrived! Auto-triggering signing
[RACE-FIX] Calling startSigning from TssRepository...
[RACE-FIX] Signing already triggered, skipping duplicate from MainViewModel
```

## 技术原则 (Technical Principles)
 拒绝延时方案:CPU性能差异导致不可靠
 采用架构方案:消除竞态条件的根源,不依赖时间假设
 参考业界模式:server-party-co-managed的PendingSessionCache
 纵深防御:Repository自动触发 + ViewModel兜底 + 防重入检查

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-26 20:11:17 -08:00
..
account Revert "fix(co-keygen): convert threshold at storage time to match tss-lib convention" 2025-12-31 10:24:25 -08:00
message-router fix(message-router): prevent subscription race condition on gRPC reconnect 2026-01-01 10:04:11 -08:00
server-party fix(participate_signing): 恢复 Execute 方法的 UserShareData 分支 2026-01-26 19:00:52 -08:00
server-party-api fix(context): use parent context instead of Background() to allow proper cancellation 2025-12-06 06:36:34 -08:00
server-party-co-managed fix(co-managed): 修复签名时使用错误 keyshare 的关键 bug 2026-01-26 19:40:14 -08:00
service-party-android fix(android): 修复2-of-3签名session_started竞态条件导致的签名失败 2026-01-26 20:11:17 -08:00
service-party-app fix(tss): 修复备份恢复后签名失败的问题 2026-01-20 00:39:05 -08:00
session-coordinator feat(session): broadcast participant_joined event via gRPC for real-time UI updates 2026-01-01 08:34:47 -08:00
tss-wasm feat(tss): add real-time round progress from msg.Type() parsing 2026-01-01 22:41:51 -08:00