- Add PartyIndex field to protobuf ParticipantInfo message
- Pass original PartyIndex from account shares to session coordinator
- Use original PartyIndex instead of loop variable when creating participants
- This fixes TSS signing failures when non-consecutive parties are selected
Add logging at key points to trace keygen_session_id flow:
- Account Handler: log keygen_session_id when creating signing session
- Session Coordinator: log keygen_session_id in CreateSession and JoinSession
- Message Router: log keygen_session_id when proxying JoinSession
- Server Party: log keygen_session_id when joining session
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Regenerate protobuf Go code with KeygenSessionId fields
- Session Coordinator correctly parses, stores, and returns keygen_session_id
- Message Router Client parses keygen_session_id in JoinSession response
- participate_signing.go uses keygen_session_id for precise share lookup
- Database schema already includes keygen_session_id column
This fixes the signing issue where wrong keyshares were loaded for multi-account scenarios.
Update session repository to properly handle delegate_party_id column:
- Add delegate_party_id to Save method INSERT and UPDATE statements
- Add DelegatePartyID field to sessionRow struct
- Update FindByUUID, FindByStatus, FindExpired, FindActive SELECT queries
- Update scanSessions method to scan and pass delegate_party_id
- Remove placeholder empty string, now loads actual value from database
This completes the delegate party functionality by ensuring the delegate party ID
is persisted and retrieved correctly from the database.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixed critical bug where PartyComposition (persistent/delegate party counts) was being sent
by account-service in gRPC request but was not being extracted and passed to the CreateSession
use case, causing delegate party selection to fail.
Changes:
- Extract PartyComposition from protobuf request and pass to CreateSessionInput
- Add logging for party composition values in gRPC handler
- Return delegate_party_id and selected_parties in CreateSessionResponse
- Load session after creation to get delegate party ID
This fixes the issue where require_delegate=true had no effect and all parties selected
were persistent parties instead of 2 persistent + 1 delegate.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Remove dynamic participant addition in JoinSession
- Participants must be pre-created in CreateSession
- Add ErrPartyNotInvited error for unauthorized join attempts
- Fix Redis adapter to include version parameter in ReconstructSession
- This fixes VSS verification failures caused by inconsistent party indices
Implement version-based optimistic locking to prevent concurrent update conflicts
when multiple parties simultaneously report completion during keygen operations.
Changes:
- Add version column to mpc_sessions table (migration 004)
- Add Version field to MPCSession entity
- Define ErrOptimisticLockConflict error
- Update SessionPostgresRepo.Update() to check version and increment on success
- Add automatic retry logic (max 3 attempts) to ReportCompletionUseCase
- Update Save and all query methods (FindByStatus, FindExpired, etc.) to handle version field
This replaces pessimistic locking (FOR UPDATE) with optimistic locking using
the industry-standard pattern: WHERE version = $n and checking rowsAffected.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added test debug log immediately after logger initialization.
If debug logging is working, we should see this message.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add comprehensive debug logs to:
1. report_completion.go - log all participant statuses at key points
2. session_postgres_repo.go - log before/after each participant update
This will help identify why server-party-1 status remains 'invited'
despite successfully reporting completion.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Problem:
Multiple parties reporting completion simultaneously caused lost updates
because each transaction would read the full session, modify their
participant status, then update ALL participants - causing last-write-wins
behavior.
Solution:
Add SELECT ... FOR UPDATE locks on both mpc_sessions and participants
tables at the start of the Update transaction. This serializes concurrent
updates and prevents lost updates.
Lock order:
1. Lock session row (FOR UPDATE)
2. Lock all participant rows for this session (FOR UPDATE)
3. Perform updates
4. Commit (releases locks)
This ensures that concurrent ReportCompletion calls are fully serialized
and each participant status update is preserved.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fix critical concurrency bug where simultaneous ReportCompletion calls from
multiple parties could cause lost database updates. Changed from UPSERT-all
to UPDATE-individual pattern to ensure each participant status update is
atomic and won't be overwritten by concurrent transactions.
Before: All participants were UPSERTed in single transaction, causing
last-commit-wins behavior that lost earlier status updates.
After: Each participant is UPDATEd individually using UPDATE...WHERE, then
INSERT only if row doesn't exist. This prevents concurrent updates to
different participants from conflicting.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add switch-case to handle Invited, Joined, and Ready states
- Auto-transition Invited -> Joined -> Ready -> Completed
- Auto-transition Joined -> Ready -> Completed
- Auto-transition Ready -> Completed
- Return error for invalid states (Failed, Completed, etc.)
- Fixes 'cannot transition to completed status' error
- Applies to all parties including server-party-api
ReportCompletion was failing with "cannot transition to completed status"
because participants were in Joined state trying to transition directly to
Completed, which violates the state machine flow: Joined -> Ready -> Completed.
Changes:
- Check participant status before marking as Completed
- Auto-transition Joined -> Ready if needed
- Then transition Ready -> Completed
- Add debug logging for auto-transition
This fixes the error seen during keygen completion.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The JoinSessionResponse from coordinator was missing party_index field,
causing message router to try finding self's index in OtherParties (which
only contains other parties). This resulted in incorrect party index
assignment leading to "duplicate indexes" error in TSS keygen.
Changes:
- Add party_index field to coordinator's JoinSessionResponse proto
- Coordinator now includes PartyIndex in gRPC response
- Message router uses party_index from coordinator instead of searching
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add signing-config API endpoints (POST/PUT/DELETE/GET) for configuring
which parties should participate in signing operations
- Add SigningParties field to Account entity with database migration
- Modify CreateSigningSession to use configured parties if set,
otherwise use all active parties (backward compatible)
- Add delegate party signing support: user provides encrypted share
at sign time for delegate party to use
- Update protobuf definitions for DelegateUserShare in session events
- Add ShareTypeDelegate to support hybrid custody model
API endpoints:
- POST /accounts/:id/signing-config - Set signing parties (first time)
- PUT /accounts/:id/signing-config - Update signing parties
- DELETE /accounts/:id/signing-config - Clear config (use all parties)
- GET /accounts/:id/signing-config - Get current configuration
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Update all Dockerfiles from Go 1.21 to Go 1.24 (required by go.mod)
- Fix line endings in deploy.sh and .env.example for Unix compatibility
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add GetRegisteredParties gRPC method to Message Router for party discovery
- Create MessageRouterPartyDiscovery adapter in Session Coordinator
- Remove K8s dependency from Session Coordinator (works in any environment)
- Add party registration to server-party-api on startup
- Fix docker-compose.yml: add MESSAGE_ROUTER_ADDR to session-coordinator
This change implements a fully decentralized party discovery mechanism:
- Parties register themselves to Message Router on startup
- Session Coordinator queries Message Router for available parties
- Works in Docker Compose, K8s, or any deployment environment
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Remove Address field from PartyEndpoint (parties connect to router themselves)
- Update K8s Discovery to only manage PartyID and Role labels
- Add Party registration and SessionEvent protobuf definitions
- Implement PartyRegistry and SessionEventBroadcaster domain logic
- Add RegisterParty and SubscribeSessionEvents gRPC handlers
- Prepare infrastructure for party-driven MPC coordination
This is the first phase of migrating from coordinator-driven to party-driven
architecture following international MPC system design patterns.
Implement Solution 1 (Party Role Labels) to differentiate between persistent
and delegate parties, with strict security guarantees for MPC threshold systems.
Key Features:
- PartyRole enum: persistent, delegate, temporary
- K8s pod labels (party-role) for role identification
- Role-based party filtering and selection
- Strict persistent-only default policy (no fallback)
- Optional PartyComposition for custom party requirements
Security Guarantees:
- Default: MUST use persistent parties (store shares in database)
- Fail fast with clear error if insufficient persistent parties
- No silent fallback to mixed/delegate parties
- Empty PartyComposition validation prevents accidental bypass
- MPC system compatibility maintained
Implementation:
1. Added PartyRole type with persistent/delegate/temporary constants
2. Extended PartyEndpoint with Role field
3. K8s party discovery extracts role from pod labels (defaults to persistent)
4. Session creation logic with strict persistent requirement
5. PartyComposition support for explicit mixed-role sessions
6. K8s deployment files with party-role labels
Files Modified:
- services/session-coordinator/application/ports/output/party_pool_port.go
- services/session-coordinator/infrastructure/k8s/party_discovery.go
- services/session-coordinator/application/ports/input/session_management_port.go
- services/session-coordinator/application/use_cases/create_session.go
- k8s/server-party-deployment.yaml (persistent role)
Files Added:
- k8s/server-party-api-deployment.yaml (delegate role)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Major architectural refactoring to align with international MPC standards
and enable horizontal scalability.
## Core Changes
### 1. DeviceInfo Made Optional
- Modified DeviceInfo.Validate() to allow empty device information
- Aligns with international MPC protocol standards
- MPC protocol layer should not mandate device-specific metadata
- Location: services/session-coordinator/domain/entities/device_info.go
### 2. Kubernetes Party Discovery Service
- Created infrastructure/k8s/party_discovery.go (220 lines)
- Implements dynamic service discovery via Kubernetes API
- Supports in-cluster config and kubeconfig fallback
- Auto-refreshes party list every 30s (configurable)
- Health-aware selection (only ready pods)
- Uses pod names as unique party IDs
### 3. Party Pool Architecture
- Defined PartyPoolPort interface for abstraction
- CreateSessionUseCase now supports automatic party selection
- When no participants specified, selects from K8s pool
- Graceful fallback to dynamic join mode if discovery fails
- Location: services/session-coordinator/application/ports/output/party_pool_port.go
### 4. Integration Updates
- Modified CreateSessionUseCase to inject partyPool
- Updated session-coordinator main.go to initialize K8s discovery
- gRPC handler already supports optional participants
- Added k8s client-go dependencies (v0.29.0) to go.mod
## Kubernetes Deployment
### New K8s Manifests
- k8s/namespace.yaml: mpc-system namespace
- k8s/configmap.yaml: shared configuration
- k8s/secrets-example.yaml: secrets template
- k8s/server-party-deployment.yaml: scalable party pool (3+ replicas)
- k8s/session-coordinator-deployment.yaml: coordinator with RBAC
- k8s/README.md: comprehensive deployment guide
### RBAC Configuration
- ServiceAccount for session-coordinator
- Role with pods/services get/list/watch permissions
- RoleBinding to grant discovery capabilities
## Key Features
✅ Dynamic service discovery via Kubernetes API
✅ Horizontal scaling (kubectl scale deployment)
✅ No hardcoded party IDs
✅ Health-aware party selection
✅ Graceful degradation when K8s unavailable
✅ MPC protocol compliance (optional DeviceInfo)
## Deployment Modes
### Docker Compose (Existing)
- Fixed 3 parties (server-party-1/2/3)
- Quick setup for development
- Backward compatible
### Kubernetes (New)
- Dynamic party pool
- Auto-discovery and scaling
- Production-ready
## Documentation
- Updated main README.md with deployment options
- Added architecture diagram showing scalable party pool
- Created comprehensive k8s/README.md with:
- Quick start guide
- Scaling instructions
- Troubleshooting section
- RBAC configuration details
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Simplified participant list handling in JoinSession client
- Added debug logging for party_index conversion in gRPC messages
- Removed redundant party filtering logic
- Added detailed logging to trace protobuf field values
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added debug logging to track participant details including party_index in:
- account service MPC keygen handler
- session coordinator gRPC client
- session coordinator gRPC handler
This helps debug the party index assignment issue where all parties
were receiving index 0 instead of unique indices (0, 1, 2).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add retry mechanism for PostgreSQL connections (10 retries, 2s base delay)
- Add retry mechanism for RabbitMQ connections (10 retries, 2s base delay)
- Add retry mechanism for Redis connections (10 retries, 2s base delay)
- Use exponential backoff: delay increases with each retry attempt
- Log detailed retry information (attempt number, max retries, errors)
- Redis continues without cache if all retries fail (non-critical)
- Database and RabbitMQ return error after all retries (critical)
This resolves startup failures when dependent services are slow to initialize,
particularly RabbitMQ which may pass health checks but not be fully ready.
- Created .env.example files with comprehensive security warnings
- Removed hardcoded IP addresses and credentials from docker-compose files
- Made database passwords mandatory (fail-fast on missing config)
- Removed Chinese mirror sources from all Dockerfiles
- Enhanced deploy.sh scripts with .env validation and auto-creation
- Added comprehensive README.md deployment guides
- Changed ALLOWED_IPS default to enable cross-server deployment
- Updated all docker-compose files to use environment variables
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Major changes:
- Add TSS core library (pkg/tss) with keygen and signing protocols
- Implement gRPC clients for Server Party service
- Add MPC session endpoints to Account service
- Deploy 3 Server Party instances in docker-compose
- Add MarkPartyReady and StartSession to proto definitions
- Complete integration tests for 2-of-3, 3-of-5, 4-of-7 thresholds
- Add comprehensive documentation (architecture, API, testing, deployment)
Test results:
- 2-of-3: PASSED (keygen 93s, signing 80s)
- 3-of-5: PASSED (keygen 198s, signing 120s)
- 4-of-7: PASSED (keygen 221s, signing 150s)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add sessionRepo to HTTP handler for database operations
- Implement MarkPartyReady handler to update participant status
- Implement StartSession handler to start MPC sessions
- Update CanStart() to accept participants in 'ready' status
- Make Start() method idempotent to handle automatic + explicit starts
- Fix repository injection through dependency chain in main.go
- Add party_id parameter to test completion request