## 问题描述 (Problem) 当用户勾选"包含服务器备份"发起2-of-3签名时,Android设备无法开始签名, 导致整个签名流程卡死。日志显示: - 服务器成功参与并发送TSS消息 ✓ - Android收到session_started事件 ✓ - 但Android未执行startSigning() ❌ ## 根本原因 (Root Cause) 典型的竞态条件: 1. Android调用createSignSessionWithOptions() API 2. 服务器立即在session_created阶段JoinSession 3. 两方都加入→session_started事件立即触发(12.383ms) 4. 但Android的result.fold回调还未完成(12.387ms才设置状态) 5. MainViewModel检查pendingSignInitiatorInfo发现为null,签名被跳过 时间窗口仅4ms,但CPU性能差异会导致100%失败率。 ## 解决方案 (Solution) 采用架构级修复,参考server-party-co-managed的PendingSessionCache模式: ### 1. TssRepository层缓存机制 (Lines ~210-223) ```kotlin // 在JoinSession成功后立即缓存签名信息 private data class PendingSignInfo( val sessionId: String, val shareId: Long, val password: String, val messageHash: String ) private var pendingSignInfo: PendingSignInfo? = null private var signingTriggered: Boolean = false ``` ### 2. 事件到达时自动触发 (Lines ~273-320) ```kotlin when (event.eventType) { "session_started" -> { // 检测到缓存的签名信息,自动触发 if (pendingSignInfo != null && !signingTriggered) { signingTriggered = true repositoryScope.launch { startSigning(...) waitForSignature() } } // 仍然通知MainViewModel(作为兜底) sessionEventCallback?.invoke(event) } } ``` ### 3. MainViewModel防重入检查 (MainViewModel.kt ~1488) ```kotlin private fun startSignAsInitiator(selectedParties: List<String>) { // 检查TssRepository是否已触发 if (repository.isSigningTriggered()) { Log.d("MainViewModel", "Signing already triggered, skipping duplicate") return } startSigningProcess(...) } ``` ## 工作流程 (Workflow) ``` createSignSessionWithOptions() ↓ 【改动】缓存pendingSignInfo (before any event) ↓ auto-join session ↓ ════ 4ms竞态窗口 ════ ↓ session_started arrives (12ms) ↓ 【改动】TssRepository检测到缓存,自动触发签名 ✓ ↓ 【改动】设置signingTriggered=true防止重复 ↓ MainViewModel.result.fold完成 (50ms) ↓ 【改动】检测已触发,跳过重复执行 ✓ ↓ 签名成功完成 ``` ## 关键修改点 (Key Changes) ### TssRepository.kt 1. 添加PendingSignInfo缓存和signingTriggered标志(Line ~210-223) 2. createSignSessionWithOptions缓存签名信息(Line ~3950-3965) 3. session_started处理器自动触发签名(Line ~273-320) 4. 导出isSigningTriggered()供ViewModel检查(Line ~399-405) ### MainViewModel.kt 1. startSignAsInitiator添加防重入检查(Line ~1488-1495) ## 向后兼容性 (Backward Compatibility) ✅ 100%向后兼容: - 保留MainViewModel原有逻辑作为fallback - 仅在includeServerBackup=true时设置缓存(其他流程不变) - 添加防重入检查,不会影响正常签名 - 普通2方签名、3方签名等流程完全不受影响 ## 验证日志 (Verification Logs) 修复后将输出: ``` [CO-SIGN-OPTIONS] Cached pendingSignInfo for sessionId=xxx [RACE-FIX] session_started arrived! Auto-triggering signing [RACE-FIX] Calling startSigning from TssRepository... [RACE-FIX] Signing already triggered, skipping duplicate from MainViewModel ``` ## 技术原则 (Technical Principles) ❌ 拒绝延时方案:CPU性能差异导致不可靠 ✅ 采用架构方案:消除竞态条件的根源,不依赖时间假设 ✅ 参考业界模式:server-party-co-managed的PendingSessionCache ✅ 纵深防御:Repository自动触发 + ViewModel兜底 + 防重入检查 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| .claude | ||
| api | ||
| docs | ||
| k8s | ||
| migrations | ||
| pkg | ||
| scripts | ||
| services | ||
| tests | ||
| .env.example | ||
| .env.party.example | ||
| .env.prod.example | ||
| .gitignore | ||
| DELEGATE_PARTY_GUIDE.md | ||
| MPC-Distributed-Signature-System-Complete-Spec.md | ||
| MPC_INTEGRATION_GUIDE.md | ||
| Makefile | ||
| PARTY_ROLE_VERIFICATION_REPORT.md | ||
| README.md | ||
| TEST_REPORT.md | ||
| VERIFICATION_REPORT.md | ||
| config.example.yaml | ||
| deploy.sh | ||
| docker-compose.party.yml | ||
| docker-compose.prod.yml | ||
| docker-compose.yml | ||
| get-docker.sh | ||
| go.mod | ||
| go.sum | ||
| test_create_session.go | ||
| test_real_scenario.sh | ||
| test_signing.go | ||
| test_signing_parties_api.go | ||
README.md
MPC System Deployment Guide
Multi-Party Computation (MPC) system for secure threshold signature scheme (TSS) implementation in the RWADurian project.
Table of Contents
- Overview
- Architecture
- Quick Start
- Configuration
- Deployment Commands
- Services
- Security
- Troubleshooting
- Production Deployment
Overview
The MPC system implements a 2-of-3 threshold signature scheme where:
- Server parties from a dynamically scalable pool hold key shares
- At least 2 parties are required to generate signatures (configurable threshold)
- User shares are generated dynamically and returned to the calling service
- All shares are encrypted using AES-256-GCM
Key Features
- Threshold Cryptography: Configurable N-of-M TSS for enhanced security
- Dynamic Party Pool: Kubernetes-based service discovery for automatic party scaling
- Distributed Architecture: Services communicate via gRPC and WebSocket
- Secure Storage: AES-256-GCM encryption for all stored shares
- API Authentication: API key and IP-based access control
- Session Management: Coordinated multi-party computation sessions
- MPC Protocol Compliance: DeviceInfo optional, aligning with international MPC standards
Architecture
┌────────────────────────────────────────────────────────────────┐
│ MPC System │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Account Service │ │ Server Party API │ │
│ │ (Port 4000) │ │ (Port 8083) │ │
│ │ External API │ │ User Share Gen │ │
│ └────────┬─────────┘ └────────┬─────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Session │◄──────►│ Message Router │ │
│ │ Coordinator │ │ (Port 8082) │ │
│ │ (Port 8081) │ │ WebSocket │ │
│ └────────┬─────────┘ └────────┬─────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌────────────────────────────────────────────┐ │
│ │ Server Party Pool (Dynamically Scalable) │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Party 1 │ │ Party 2 │ │ Party 3 │ │ K8s Discovery │
│ │ │ (TSS) │ │ (TSS) │ │ (TSS) │ │ Auto-selected │
│ │ └──────────┘ └──────────┘ └──────────┘ │ from pool │
│ │ ┌──────────┐ ... can scale up/down │ │
│ │ │ Party N │ │ │
│ │ └──────────┘ │ │
│ └────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────┐ │
│ │ Infrastructure Services │ │
│ │ PostgreSQL │ Redis │ RabbitMQ │ │
│ └────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
│
│ Network Access
▼
┌──────────────────────────┐
│ Backend Services │
│ mpc-service (caller) │
└──────────────────────────┘
Deployment Options
This system supports two deployment modes:
Option 1: Docker Compose (Development/Simple Deployment)
- Quick setup for development or simple production environments
- Fixed 3 server parties (hardcoded IDs)
- See instructions below in "Quick Start"
Option 2: Kubernetes (Production/Scalable Deployment)
- Dynamic party pool with service discovery
- Horizontally scalable server parties
- Recommended for production environments
- See
k8s/README.mdfor detailed instructions
Quick Start (Docker Compose)
Prerequisites
- Docker (version 20.10+)
- Docker Compose (version 2.0+)
- Network Access from backend services
- Ports Available: 4000, 8081, 8082, 8083
1. Initial Setup
cd backend/mpc-system
# Create environment configuration
cp .env.example .env
# Edit configuration for your environment
nano .env
2. Configure Environment
Edit .env and update the following REQUIRED values:
# Database password (REQUIRED)
POSTGRES_PASSWORD=your_secure_postgres_password
# RabbitMQ password (REQUIRED)
RABBITMQ_PASSWORD=your_secure_rabbitmq_password
# JWT secret key (REQUIRED, min 32 chars)
JWT_SECRET_KEY=your_jwt_secret_key_at_least_32_characters
# Master encryption key (REQUIRED, exactly 64 hex chars)
# WARNING: If you lose this, encrypted shares cannot be recovered!
CRYPTO_MASTER_KEY=$(openssl rand -hex 32)
# API key for server-to-server auth (REQUIRED)
# Must match the MPC_API_KEY in your backend mpc-service config
MPC_API_KEY=your_api_key_matching_mpc_service
# Allowed IPs (REQUIRED - update to actual backend server IP!)
ALLOWED_IPS=192.168.1.111
3. Deploy Services
# Start all services
./deploy.sh up
# Check status
./deploy.sh status
# View logs
./deploy.sh logs
4. Verify Deployment
# Health check
./deploy.sh health
# Test API
./deploy.sh test-api
Configuration
All configuration is managed through .env file. See .env.example for complete documentation.
Critical Environment Variables
| Variable | Description | Required | Example |
|---|---|---|---|
POSTGRES_PASSWORD |
Database password | Yes | openssl rand -base64 32 |
RABBITMQ_PASSWORD |
Message broker password | Yes | openssl rand -base64 32 |
JWT_SECRET_KEY |
JWT signing key (≥32 chars) | Yes | openssl rand -base64 48 |
CRYPTO_MASTER_KEY |
AES-256 key (64 hex chars) | Yes | openssl rand -hex 32 |
MPC_API_KEY |
API authentication key | Yes | openssl rand -base64 48 |
ALLOWED_IPS |
Comma-separated allowed IPs | Yes | 192.168.1.111,192.168.1.112 |
ENVIRONMENT |
Environment name | No | production (default) |
REDIS_PASSWORD |
Redis password | No | Leave empty for internal network |
Generating Secure Keys
# PostgreSQL & RabbitMQ passwords
openssl rand -base64 32
# JWT Secret Key
openssl rand -base64 48
# Master Encryption Key (MUST be exactly 64 hex characters)
openssl rand -hex 32
# API Key
openssl rand -base64 48
Configuration Checklist
Before deploying to production:
- Change all default passwords
- Generate secure
CRYPTO_MASTER_KEYand back it up securely - Set
MPC_API_KEYto match backend mpc-service configuration - Update
ALLOWED_IPSto actual backend server IP(s) - Backup
.envfile to secure location (NOT in git!)
Deployment Commands
Basic Operations
./deploy.sh up # Start all services
./deploy.sh down # Stop all services
./deploy.sh restart # Restart all services
./deploy.sh logs [svc] # View logs (all or specific service)
./deploy.sh status # Show service status
./deploy.sh health # Health check all services
Build Commands
./deploy.sh build # Build Docker images
./deploy.sh build-no-cache # Rebuild without cache
Service Management
# Infrastructure only
./deploy.sh infra up # Start postgres, redis, rabbitmq
./deploy.sh infra down # Stop infrastructure
# MPC services only
./deploy.sh mpc up # Start MPC services
./deploy.sh mpc down # Stop MPC services
./deploy.sh mpc restart # Restart MPC services
Debugging
./deploy.sh logs-tail [service] # Last 100 log lines
./deploy.sh shell [service] # Open shell in container
./deploy.sh test-api # Test Account Service API
Cleanup
# WARNING: This removes all data!
./deploy.sh clean
Services
External Services (Exposed Ports)
| Service | Port | Protocol | Purpose |
|---|---|---|---|
| account-service | 4000 | HTTP | Main API for backend integration |
| session-coordinator | 8081 | HTTP/gRPC | Session coordination |
| message-router | 8082 | WebSocket/gRPC | Message routing |
| server-party-api | 8083 | HTTP | User share generation |
Internal Services
| Service | Purpose |
|---|---|
| server-party-1/2/3 | TSS parties (Docker Compose mode - fixed IDs) |
| server-party-pool | TSS party pool (Kubernetes mode - dynamic scaling) |
| postgres | Database for session/account data |
| redis | Cache and temporary data |
| rabbitmq | Message broker for inter-service communication |
Note: In Kubernetes mode, server parties are discovered dynamically using K8s service discovery. Parties can be scaled up/down without service interruption.
Service Dependencies
Infrastructure Services (postgres, redis, rabbitmq)
↓
Session Coordinator & Message Router
↓
Server Parties (1, 2, 3) & Server Party API
↓
Account Service (external API)
Security
Access Control
- IP Whitelisting: Only IPs in
ALLOWED_IPScan access the API - API Key Authentication: Requires valid
MPC_API_KEYheader - Network Isolation: Services communicate within Docker network
Data Protection
- Encryption at Rest: All shares encrypted with AES-256-GCM
- Master Key:
CRYPTO_MASTER_KEYmust be securely stored and backed up - Secure Transport: Use HTTPS/TLS for external communication
Best Practices
- Never commit
.envto version control - Backup
CRYPTO_MASTER_KEYto multiple secure locations - Rotate API keys regularly
- Use strong passwords (min 32 chars)
- Restrict database ports (don't expose to internet)
- Monitor failed authentication attempts
- Enable audit logging
Key Backup
# Backup master key (CRITICAL!)
echo "CRYPTO_MASTER_KEY=$(grep CRYPTO_MASTER_KEY .env | cut -d= -f2)" > master_key.backup
# Store securely (encrypted USB, password manager, vault)
# NEVER store in plaintext on the server
Troubleshooting
Services won't start
# Check logs
./deploy.sh logs
# Check specific service
./deploy.sh logs postgres
# Common issues:
# 1. Ports already in use
# 2. .env file missing or misconfigured
# 3. Database initialization failed
Database connection errors
# Check postgres health
docker compose ps postgres
# View postgres logs
./deploy.sh logs postgres
# Restart infrastructure
./deploy.sh infra down
./deploy.sh infra up
API returns 403 Forbidden
# Check ALLOWED_IPS configuration
grep ALLOWED_IPS .env
# Verify caller's IP is in the list
# Update .env and restart:
./deploy.sh restart
API returns 401 Unauthorized
# Verify MPC_API_KEY matches between:
# 1. This system's .env
# 2. Backend mpc-service configuration
# Check API key
grep MPC_API_KEY .env
# Restart after updating
./deploy.sh restart
Keygen or signing fails
# Check all server parties are healthy
./deploy.sh health
# View server party logs
./deploy.sh logs server-party-1
./deploy.sh logs server-party-2
./deploy.sh logs server-party-3
# Check message router
./deploy.sh logs message-router
# Restart MPC services
./deploy.sh mpc restart
Lost master encryption key
CRITICAL: If CRYPTO_MASTER_KEY is lost, encrypted shares cannot be recovered!
Prevention:
- Backup key immediately after generation
- Store in multiple secure locations
- Use enterprise key management system in production
Production Deployment
Pre-Deployment Checklist
- Generate all secure keys and passwords
- Backup
CRYPTO_MASTER_KEYto secure locations - Configure
ALLOWED_IPSfor actual backend server - Sync
MPC_API_KEYwith backend mpc-service - Set up database backups
- Configure log aggregation
- Set up monitoring and alerts
- Document recovery procedures
- Test disaster recovery
Deployment Steps
Step 1: Prepare Environment
# On MPC server
git clone <repo> /opt/rwadurian
cd /opt/rwadurian/backend/mpc-system
# Configure environment
cp .env.example .env
nano .env # Set all required values
# Generate and backup keys
openssl rand -hex 32 > master_key.txt
# Copy to secure storage, then delete:
# rm master_key.txt
Step 2: Deploy Services
# Build images
./deploy.sh build
# Start services
./deploy.sh up
# Verify all healthy
./deploy.sh health
Step 3: Configure Firewall
# Allow backend server to access MPC ports
sudo ufw allow from <BACKEND_IP> to any port 4000
sudo ufw allow from <BACKEND_IP> to any port 8081
sudo ufw allow from <BACKEND_IP> to any port 8082
sudo ufw allow from <BACKEND_IP> to any port 8083
# Deny all other external access
sudo ufw default deny incoming
sudo ufw enable
Step 4: Test Integration
# From backend server, test API access
curl -H "X-API-Key: YOUR_MPC_API_KEY" \
http://<MPC_SERVER_IP>:4000/health
Monitoring
Monitor these metrics:
- Service health status
- API request rate and latency
- Failed authentication attempts
- Database connection pool usage
- RabbitMQ queue depths
- Key generation/signing success rates
Backup Strategy
# Database backup (daily)
docker compose exec postgres pg_dump -U mpc_user mpc_system > backup_$(date +%Y%m%d).sql
# Configuration backup
tar -czf config_backup_$(date +%Y%m%d).tar.gz .env kong.yml
# Encryption key backup (secure storage only!)
Disaster Recovery
- Service Failure: Restart affected service using
./deploy.sh restart - Database Corruption: Restore from latest backup
- Key Loss: If
CRYPTO_MASTER_KEYlost, all encrypted shares are unrecoverable - Full System Recovery: Redeploy from backups, restore database
Performance Tuning
# docker-compose.yml - adjust resources
services:
session-coordinator:
deploy:
resources:
limits:
cpus: '2'
memory: 2G
API Reference
Account Service API (Port 4000)
# Health check
curl http://localhost:4000/health
# Create account (keygen)
curl -X POST http://localhost:4000/api/v1/accounts \
-H "X-API-Key: YOUR_MPC_API_KEY" \
-H "Content-Type: application/json" \
-d '{"user_id": "user123"}'
# Sign transaction
curl -X POST http://localhost:4000/api/v1/accounts/{account_id}/sign \
-H "X-API-Key: YOUR_MPC_API_KEY" \
-H "Content-Type: application/json" \
-d '{"message": "tx_hash"}'
Server Party API (Port 8083)
# Generate user share
curl -X POST http://localhost:8083/api/v1/shares/generate \
-H "X-API-Key: YOUR_MPC_API_KEY" \
-H "Content-Type: application/json" \
-d '{"session_id": "session123"}'
Getting Help
- Check logs:
./deploy.sh logs - Health check:
./deploy.sh health - View commands:
./deploy.sh help - Review
.env.examplefor configuration options
License
Copyright © 2024 RWADurian. All rights reserved.