rwadurian/backend/mpc-system
hailin 26ef03a1bc fix(android): 配置 OkHttpClient 连接池并添加资源清理 [P1-2]
【架构安全修复 - 防止 OkHttpClient 资源泄漏】

## 问题背景

OkHttpClient 内部维护多种资源:
1. ConnectionPool - 连接池,复用 HTTP 连接
2. Dispatcher - 调度器,管理线程池
3. Cache - 可选的响应缓存

如果不配置连接池参数和不清理资源,会导致:
1. 连接池无限增长 → 内存泄漏
2. 空闲连接永久保持 → 占用系统资源(文件描述符、Socket)
3. Dispatcher 线程池未关闭 → 线程泄漏

## 修复方案

### 1. 配置连接池参数

限制连接池大小和空闲连接保活时间:
- maxIdleConnections: 5 (最多保留 5 个空闲连接)
- keepAliveDuration: 5 分钟 (空闲连接保活时间)

修改位置:
- TssRepository.kt httpClient
- TransactionUtils.kt client

代码示例:

### 2. 在 cleanup() 中清理资源

TssRepository.cleanup() 中添加:

### 3. TransactionUtils 提供清理方法

虽然 TransactionUtils 是 object 单例,但提供 cleanup() 方法允许:
1. 测试环境清理资源
2. 应用完全退出时释放资源
3. 内存紧张时主动清理

## 修复的内存泄漏风险

### 场景 1: 连接池无限增长
- 原问题: 没有配置 maxIdleConnections,连接池可能无限增长
- 后果: 每个连接占用一个 Socket,文件描述符耗尽 → 无法创建新连接
- 修复: 限制最多 5 个空闲连接

### 场景 2: 空闲连接永久保持
- 原问题: 没有配置 keepAliveDuration,空闲连接永久保持
- 后果: 占用服务器资源,网络中间设备可能断开长时间不活动的连接
- 修复: 5 分钟后自动关闭空闲连接

### 场景 3: 应用退出时资源未释放
- 原问题: cleanup() 没有清理 OkHttpClient 资源
- 后果: 线程池和连接未关闭,延迟应用退出,可能导致 ANR
- 修复: cleanup() 中显式关闭连接池和调度器

### 场景 4: Activity 快速重建时资源累积
- 原问题: 虽然 TssRepository 是单例,但快速重建时临时创建的 client 未清理
- 后果: 临时 client 的资源累积(如 getBalance, getTokenBalance 中的临时 client)
- 注意: 这些临时 client 应该使用共享的 httpClient 而非每次创建新的

## 影响范围

### 修改的文件
1. TssRepository.kt
   - 配置 httpClient 的 ConnectionPool
   - cleanup() 中添加 OkHttpClient 资源清理

2. TransactionUtils.kt
   - 配置 client 的 ConnectionPool
   - 添加 cleanup() 方法

### 行为变化
- BEFORE: 连接池无限制,资源不清理
- AFTER: 连接池限制 5 个空闲连接,5 分钟保活,cleanup() 时释放所有资源

## 测试验证

编译状态:  BUILD SUCCESSFUL in 39s
- 无编译错误
- 仅有警告 (unused parameters),不影响功能

## 潜在改进

建议进一步优化:
1. 统一使用单例 OkHttpClient - 避免在 TssRepository 中创建多个临时 client
2. 监控连接池使用情况 - 添加日志记录连接池大小
3. 根据实际使用调整参数 - 如果并发请求较多,可增大 maxIdleConnections

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-26 21:47:39 -08:00
..
.claude refactor(mpc-system): migrate to party-driven architecture with PartyID-based routing 2025-12-05 08:11:28 -08:00
api fix(mpc-system): GetSessionStatus 返回实际的 threshold_n 和 threshold_t 2025-12-29 11:59:53 -08:00
docs feat(mpc-system): add event sourcing for session tracking 2025-12-05 23:31:04 -08:00
k8s feat(mpc-system): implement party role labels with strict persistent-only default 2025-12-05 07:08:59 -08:00
migrations fix(migration): 使数据库迁移脚本幂等化,支持重复执行 2025-12-28 05:26:38 -08:00
pkg fix(tss): convert threshold to tss-lib format (threshold-1) in all keygen and signing 2025-12-31 12:19:58 -08:00
scripts fix: convert deploy.sh CRLF to LF and add executable permission 2025-12-07 07:01:13 -08:00
services fix(android): 配置 OkHttpClient 连接池并添加资源清理 [P1-2] 2026-01-26 21:47:39 -08:00
tests refactor(mpc-system): migrate to party-driven architecture with PartyID-based routing 2025-12-05 08:11:28 -08:00
.env.example docs(config): update .env.example files for production deployment 2025-12-07 04:55:21 -08:00
.env.party.example feat(mpc-system): add signing parties configuration and delegate signing support 2025-12-05 22:47:55 -08:00
.env.prod.example feat(mpc-system): add signing parties configuration and delegate signing support 2025-12-05 22:47:55 -08:00
.gitignore refactor(mpc-system): migrate to party-driven architecture with PartyID-based routing 2025-12-05 08:11:28 -08:00
DELEGATE_PARTY_GUIDE.md feat(mpc-system): implement delegate party for hybrid custody 2025-12-05 09:07:46 -08:00
MPC-Distributed-Signature-System-Complete-Spec.md refactor(mpc-system): migrate to party-driven architecture with PartyID-based routing 2025-12-05 08:11:28 -08:00
MPC_INTEGRATION_GUIDE.md refactor(mpc-system): migrate to party-driven architecture with PartyID-based routing 2025-12-05 08:11:28 -08:00
Makefile refactor(mpc-system): migrate to party-driven architecture with PartyID-based routing 2025-12-05 08:11:28 -08:00
PARTY_ROLE_VERIFICATION_REPORT.md refactor(mpc-system): migrate to party-driven architecture with PartyID-based routing 2025-12-05 08:11:28 -08:00
README.md feat(mpc-system): implement Kubernetes-based dynamic party pool architecture 2025-12-05 06:12:49 -08:00
TEST_REPORT.md refactor(mpc-system): migrate to party-driven architecture with PartyID-based routing 2025-12-05 08:11:28 -08:00
VERIFICATION_REPORT.md refactor(mpc-system): migrate to party-driven architecture with PartyID-based routing 2025-12-05 08:11:28 -08:00
config.example.yaml refactor(mpc-system): migrate to party-driven architecture with PartyID-based routing 2025-12-05 08:11:28 -08:00
deploy.sh feat(mpc-system): add server-party-co-managed for co_managed_keygen sessions 2025-12-29 23:54:45 -08:00
docker-compose.party.yml chore(docker): 为 mpc-system、api-gateway、infrastructure 添加时区配置 2025-12-23 18:35:09 -08:00
docker-compose.prod.yml chore(docker): 为 mpc-system、api-gateway、infrastructure 添加时区配置 2025-12-23 18:35:09 -08:00
docker-compose.yml feat(mpc-system): add server-party-co-managed for co_managed_keygen sessions 2025-12-29 23:54:45 -08:00
get-docker.sh refactor(mpc-system): migrate to party-driven architecture with PartyID-based routing 2025-12-05 08:11:28 -08:00
go.mod feat(mpc-system): implement party-driven architecture with SessionEvent broadcasting 2025-12-05 08:44:05 -08:00
go.sum feat(mpc-system): implement party-driven architecture with SessionEvent broadcasting 2025-12-05 08:44:05 -08:00
test_create_session.go feat: add keygen_session_id to signing session flow 2025-12-06 08:39:40 -08:00
test_real_scenario.sh refactor(mpc-system): migrate to party-driven architecture with PartyID-based routing 2025-12-05 08:11:28 -08:00
test_signing.go test: update signing test username 2025-12-06 10:54:22 -08:00
test_signing_parties_api.go fix: update test username for signing parties API test 2025-12-06 10:29:30 -08:00

README.md

MPC System Deployment Guide

Multi-Party Computation (MPC) system for secure threshold signature scheme (TSS) implementation in the RWADurian project.

Table of Contents

Overview

The MPC system implements a 2-of-3 threshold signature scheme where:

  • Server parties from a dynamically scalable pool hold key shares
  • At least 2 parties are required to generate signatures (configurable threshold)
  • User shares are generated dynamically and returned to the calling service
  • All shares are encrypted using AES-256-GCM

Key Features

  • Threshold Cryptography: Configurable N-of-M TSS for enhanced security
  • Dynamic Party Pool: Kubernetes-based service discovery for automatic party scaling
  • Distributed Architecture: Services communicate via gRPC and WebSocket
  • Secure Storage: AES-256-GCM encryption for all stored shares
  • API Authentication: API key and IP-based access control
  • Session Management: Coordinated multi-party computation sessions
  • MPC Protocol Compliance: DeviceInfo optional, aligning with international MPC standards

Architecture

┌────────────────────────────────────────────────────────────────┐
│                         MPC System                              │
│                                                                 │
│  ┌──────────────────┐        ┌──────────────────┐              │
│  │ Account Service  │        │ Server Party API │              │
│  │  (Port 4000)     │        │  (Port 8083)     │              │
│  │ External API     │        │ User Share Gen   │              │
│  └────────┬─────────┘        └────────┬─────────┘              │
│           │                           │                        │
│           ▼                           ▼                        │
│  ┌──────────────────┐        ┌──────────────────┐              │
│  │   Session        │◄──────►│ Message Router   │              │
│  │   Coordinator    │        │  (Port 8082)     │              │
│  │  (Port 8081)     │        │  WebSocket       │              │
│  └────────┬─────────┘        └────────┬─────────┘              │
│           │                           │                        │
│           ▼                           ▼                        │
│  ┌────────────────────────────────────────────┐                │
│  │   Server Party Pool (Dynamically Scalable) │                │
│  │   ┌──────────┐ ┌──────────┐ ┌──────────┐  │                │
│  │   │ Party 1  │ │ Party 2  │ │ Party 3  │  │  K8s Discovery │
│  │   │  (TSS)   │ │  (TSS)   │ │  (TSS)   │  │  Auto-selected │
│  │   └──────────┘ └──────────┘ └──────────┘  │  from pool     │
│  │   ┌──────────┐     ... can scale up/down  │                │
│  │   │ Party N  │                             │                │
│  │   └──────────┘                             │                │
│  └────────────────────────────────────────────┘                │
│                                                                 │
│  ┌────────────────────────────────────────────┐                │
│  │         Infrastructure Services            │                │
│  │  PostgreSQL  │  Redis  │  RabbitMQ         │                │
│  └────────────────────────────────────────────┘                │
└────────────────────────────────────────────────────────────────┘
                           │
                           │ Network Access
                           ▼
              ┌──────────────────────────┐
              │   Backend Services       │
              │   mpc-service (caller)   │
              └──────────────────────────┘

Deployment Options

This system supports two deployment modes:

Option 1: Docker Compose (Development/Simple Deployment)

  • Quick setup for development or simple production environments
  • Fixed 3 server parties (hardcoded IDs)
  • See instructions below in "Quick Start"

Option 2: Kubernetes (Production/Scalable Deployment)

  • Dynamic party pool with service discovery
  • Horizontally scalable server parties
  • Recommended for production environments
  • See k8s/README.md for detailed instructions

Quick Start (Docker Compose)

Prerequisites

  • Docker (version 20.10+)
  • Docker Compose (version 2.0+)
  • Network Access from backend services
  • Ports Available: 4000, 8081, 8082, 8083

1. Initial Setup

cd backend/mpc-system

# Create environment configuration
cp .env.example .env

# Edit configuration for your environment
nano .env

2. Configure Environment

Edit .env and update the following REQUIRED values:

# Database password (REQUIRED)
POSTGRES_PASSWORD=your_secure_postgres_password

# RabbitMQ password (REQUIRED)
RABBITMQ_PASSWORD=your_secure_rabbitmq_password

# JWT secret key (REQUIRED, min 32 chars)
JWT_SECRET_KEY=your_jwt_secret_key_at_least_32_characters

# Master encryption key (REQUIRED, exactly 64 hex chars)
# WARNING: If you lose this, encrypted shares cannot be recovered!
CRYPTO_MASTER_KEY=$(openssl rand -hex 32)

# API key for server-to-server auth (REQUIRED)
# Must match the MPC_API_KEY in your backend mpc-service config
MPC_API_KEY=your_api_key_matching_mpc_service

# Allowed IPs (REQUIRED - update to actual backend server IP!)
ALLOWED_IPS=192.168.1.111

3. Deploy Services

# Start all services
./deploy.sh up

# Check status
./deploy.sh status

# View logs
./deploy.sh logs

4. Verify Deployment

# Health check
./deploy.sh health

# Test API
./deploy.sh test-api

Configuration

All configuration is managed through .env file. See .env.example for complete documentation.

Critical Environment Variables

Variable Description Required Example
POSTGRES_PASSWORD Database password Yes openssl rand -base64 32
RABBITMQ_PASSWORD Message broker password Yes openssl rand -base64 32
JWT_SECRET_KEY JWT signing key (≥32 chars) Yes openssl rand -base64 48
CRYPTO_MASTER_KEY AES-256 key (64 hex chars) Yes openssl rand -hex 32
MPC_API_KEY API authentication key Yes openssl rand -base64 48
ALLOWED_IPS Comma-separated allowed IPs Yes 192.168.1.111,192.168.1.112
ENVIRONMENT Environment name No production (default)
REDIS_PASSWORD Redis password No Leave empty for internal network

Generating Secure Keys

# PostgreSQL & RabbitMQ passwords
openssl rand -base64 32

# JWT Secret Key
openssl rand -base64 48

# Master Encryption Key (MUST be exactly 64 hex characters)
openssl rand -hex 32

# API Key
openssl rand -base64 48

Configuration Checklist

Before deploying to production:

  • Change all default passwords
  • Generate secure CRYPTO_MASTER_KEY and back it up securely
  • Set MPC_API_KEY to match backend mpc-service configuration
  • Update ALLOWED_IPS to actual backend server IP(s)
  • Backup .env file to secure location (NOT in git!)

Deployment Commands

Basic Operations

./deploy.sh up          # Start all services
./deploy.sh down        # Stop all services
./deploy.sh restart     # Restart all services
./deploy.sh logs [svc]  # View logs (all or specific service)
./deploy.sh status      # Show service status
./deploy.sh health      # Health check all services

Build Commands

./deploy.sh build            # Build Docker images
./deploy.sh build-no-cache   # Rebuild without cache

Service Management

# Infrastructure only
./deploy.sh infra up    # Start postgres, redis, rabbitmq
./deploy.sh infra down  # Stop infrastructure

# MPC services only
./deploy.sh mpc up      # Start MPC services
./deploy.sh mpc down    # Stop MPC services
./deploy.sh mpc restart # Restart MPC services

Debugging

./deploy.sh logs-tail [service]  # Last 100 log lines
./deploy.sh shell [service]      # Open shell in container
./deploy.sh test-api             # Test Account Service API

Cleanup

# WARNING: This removes all data!
./deploy.sh clean

Services

External Services (Exposed Ports)

Service Port Protocol Purpose
account-service 4000 HTTP Main API for backend integration
session-coordinator 8081 HTTP/gRPC Session coordination
message-router 8082 WebSocket/gRPC Message routing
server-party-api 8083 HTTP User share generation

Internal Services

Service Purpose
server-party-1/2/3 TSS parties (Docker Compose mode - fixed IDs)
server-party-pool TSS party pool (Kubernetes mode - dynamic scaling)
postgres Database for session/account data
redis Cache and temporary data
rabbitmq Message broker for inter-service communication

Note: In Kubernetes mode, server parties are discovered dynamically using K8s service discovery. Parties can be scaled up/down without service interruption.

Service Dependencies

Infrastructure Services (postgres, redis, rabbitmq)
    ↓
Session Coordinator & Message Router
    ↓
Server Parties (1, 2, 3) & Server Party API
    ↓
Account Service (external API)

Security

Access Control

  1. IP Whitelisting: Only IPs in ALLOWED_IPS can access the API
  2. API Key Authentication: Requires valid MPC_API_KEY header
  3. Network Isolation: Services communicate within Docker network

Data Protection

  1. Encryption at Rest: All shares encrypted with AES-256-GCM
  2. Master Key: CRYPTO_MASTER_KEY must be securely stored and backed up
  3. Secure Transport: Use HTTPS/TLS for external communication

Best Practices

  • Never commit .env to version control
  • Backup CRYPTO_MASTER_KEY to multiple secure locations
  • Rotate API keys regularly
  • Use strong passwords (min 32 chars)
  • Restrict database ports (don't expose to internet)
  • Monitor failed authentication attempts
  • Enable audit logging

Key Backup

# Backup master key (CRITICAL!)
echo "CRYPTO_MASTER_KEY=$(grep CRYPTO_MASTER_KEY .env | cut -d= -f2)" > master_key.backup

# Store securely (encrypted USB, password manager, vault)
# NEVER store in plaintext on the server

Troubleshooting

Services won't start

# Check logs
./deploy.sh logs

# Check specific service
./deploy.sh logs postgres

# Common issues:
# 1. Ports already in use
# 2. .env file missing or misconfigured
# 3. Database initialization failed

Database connection errors

# Check postgres health
docker compose ps postgres

# View postgres logs
./deploy.sh logs postgres

# Restart infrastructure
./deploy.sh infra down
./deploy.sh infra up

API returns 403 Forbidden

# Check ALLOWED_IPS configuration
grep ALLOWED_IPS .env

# Verify caller's IP is in the list
# Update .env and restart:
./deploy.sh restart

API returns 401 Unauthorized

# Verify MPC_API_KEY matches between:
# 1. This system's .env
# 2. Backend mpc-service configuration

# Check API key
grep MPC_API_KEY .env

# Restart after updating
./deploy.sh restart

Keygen or signing fails

# Check all server parties are healthy
./deploy.sh health

# View server party logs
./deploy.sh logs server-party-1
./deploy.sh logs server-party-2
./deploy.sh logs server-party-3

# Check message router
./deploy.sh logs message-router

# Restart MPC services
./deploy.sh mpc restart

Lost master encryption key

CRITICAL: If CRYPTO_MASTER_KEY is lost, encrypted shares cannot be recovered!

Prevention:

  • Backup key immediately after generation
  • Store in multiple secure locations
  • Use enterprise key management system in production

Production Deployment

Pre-Deployment Checklist

  • Generate all secure keys and passwords
  • Backup CRYPTO_MASTER_KEY to secure locations
  • Configure ALLOWED_IPS for actual backend server
  • Sync MPC_API_KEY with backend mpc-service
  • Set up database backups
  • Configure log aggregation
  • Set up monitoring and alerts
  • Document recovery procedures
  • Test disaster recovery

Deployment Steps

Step 1: Prepare Environment

# On MPC server
git clone <repo> /opt/rwadurian
cd /opt/rwadurian/backend/mpc-system

# Configure environment
cp .env.example .env
nano .env  # Set all required values

# Generate and backup keys
openssl rand -hex 32 > master_key.txt
# Copy to secure storage, then delete:
# rm master_key.txt

Step 2: Deploy Services

# Build images
./deploy.sh build

# Start services
./deploy.sh up

# Verify all healthy
./deploy.sh health

Step 3: Configure Firewall

# Allow backend server to access MPC ports
sudo ufw allow from <BACKEND_IP> to any port 4000
sudo ufw allow from <BACKEND_IP> to any port 8081
sudo ufw allow from <BACKEND_IP> to any port 8082
sudo ufw allow from <BACKEND_IP> to any port 8083

# Deny all other external access
sudo ufw default deny incoming
sudo ufw enable

Step 4: Test Integration

# From backend server, test API access
curl -H "X-API-Key: YOUR_MPC_API_KEY" \
  http://<MPC_SERVER_IP>:4000/health

Monitoring

Monitor these metrics:

  • Service health status
  • API request rate and latency
  • Failed authentication attempts
  • Database connection pool usage
  • RabbitMQ queue depths
  • Key generation/signing success rates

Backup Strategy

# Database backup (daily)
docker compose exec postgres pg_dump -U mpc_user mpc_system > backup_$(date +%Y%m%d).sql

# Configuration backup
tar -czf config_backup_$(date +%Y%m%d).tar.gz .env kong.yml

# Encryption key backup (secure storage only!)

Disaster Recovery

  1. Service Failure: Restart affected service using ./deploy.sh restart
  2. Database Corruption: Restore from latest backup
  3. Key Loss: If CRYPTO_MASTER_KEY lost, all encrypted shares are unrecoverable
  4. Full System Recovery: Redeploy from backups, restore database

Performance Tuning

# docker-compose.yml - adjust resources
services:
  session-coordinator:
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G

API Reference

Account Service API (Port 4000)

# Health check
curl http://localhost:4000/health

# Create account (keygen)
curl -X POST http://localhost:4000/api/v1/accounts \
  -H "X-API-Key: YOUR_MPC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"user_id": "user123"}'

# Sign transaction
curl -X POST http://localhost:4000/api/v1/accounts/{account_id}/sign \
  -H "X-API-Key: YOUR_MPC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"message": "tx_hash"}'

Server Party API (Port 8083)

# Generate user share
curl -X POST http://localhost:8083/api/v1/shares/generate \
  -H "X-API-Key: YOUR_MPC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"session_id": "session123"}'

Getting Help

  • Check logs: ./deploy.sh logs
  • Health check: ./deploy.sh health
  • View commands: ./deploy.sh help
  • Review .env.example for configuration options

License

Copyright © 2024 RWADurian. All rights reserved.