Commit Graph

114 Commits

Author SHA1 Message Date
hailin df0a041faa chore(docker): 为 mpc-system、api-gateway、infrastructure 添加时区配置
统一所有 Docker 服务时区为 Asia/Shanghai:

mpc-system:
- docker-compose.yml: postgres, session-coordinator, message-router, server-party-1/2/3, server-party-api, account-service
- docker-compose.prod.yml: postgres, message-router, session-coordinator, account-service, server-party-api
- docker-compose.party.yml: postgres, server-party

api-gateway:
- kong-db, kong-migrations, kong

infrastructure:
- consul, jaeger, grafana, minio

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-23 18:35:09 -08:00
hailin a01284678d feat(wallet/mpc): 增强提现和充值流程可靠性
## 主要改进

### MPC 签名系统 (mpc-system)
- 添加签名缓存机制,避免重复签名请求
- 修复 yParity 恢复逻辑,确保签名格式正确
- 优化签名完成报告流程

### 区块链服务 (blockchain-service)
- EIP-1559 降级为 Legacy 交易(KAVA 测试网兼容)
- 修复 gas 估算逻辑

### 钱包服务 (wallet-service)
- 添加乐观锁机制 (version 字段) 防止并发修改
- 提现确认流程添加事务保护 + 乐观锁
- 提现失败时正确解冻 amount + fee
- 充值流程添加事务保护 + 乐观锁
- Kafka consumer 添加错误重抛,触发重试机制

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 19:47:20 -08:00
hailin 0c00382a98 fix: convert deploy.sh CRLF to LF and add executable permission
- Convert Windows CRLF line endings to Unix LF for all deploy.sh files
- Add executable permission to all deploy.sh scripts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-07 07:01:13 -08:00
hailin e76adcbe8d . 2025-12-07 14:56:13 +00:00
hailin b25a893d37 docs(config): update .env.example files for production deployment
- Update all .env.example files with production topology documentation
- Add network configuration for Server A (119.145.15.38/192.168.1.100) and Server B (192.168.1.111)
- Document service ports and connection URLs for all microservices
- Add architecture diagrams in comments for easy reference
- Include security notes and key generation commands

Files updated:
- backend/services/.env.example (main config)
- backend/services/identity-service/.env.example
- backend/services/mpc-service/.env.example
- backend/services/blockchain-service/.env.example
- backend/mpc-system/.env.example
- backend/api-gateway/.env.example
- backend/infrastructure/.env.example

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-07 04:55:21 -08:00
hailin 9fc41cfa53 fix: add keygen index to sorted index mapping for signing session
When signing with a subset of parties (e.g., party-1 and party-3 in 2-of-3),
the TSS library creates a sorted array of party IDs. Messages contain the
original keygen party index, but we need to map it to the sorted array index.

This fixes the 'invalid FromPartyIndex' error when signing with non-consecutive
party indices.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 11:04:19 -08:00
hailin f769c7eebf test: update signing test username 2025-12-06 10:54:22 -08:00
hailin ac4d9283dc fix: preserve original PartyIndex from keygen for signing sessions
- Add PartyIndex field to protobuf ParticipantInfo message
- Pass original PartyIndex from account shares to session coordinator
- Use original PartyIndex instead of loop variable when creating participants
- This fixes TSS signing failures when non-consecutive parties are selected
2025-12-06 10:45:05 -08:00
hailin 1d507a7afd test: update signing test to use wallet with configured parties 2025-12-06 10:34:14 -08:00
hailin 8dd1c50eb9 fix: update test username for signing parties API test 2025-12-06 10:29:30 -08:00
hailin 1044cfe635 fix: correct signing parties count validation to T+1 (required signers for TSS) 2025-12-06 10:20:21 -08:00
hailin 47a98da4e4 test: add signing parties API test script 2025-12-06 10:18:19 -08:00
hailin 93eab1931e test: update wallet username
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 10:08:17 -08:00
hailin dbe630dbd6 fix: add wait time before TSS protocol to prevent race condition
Wait 500ms after subscribing to messages to ensure all parties have
completed subscription before starting TSS protocol. This prevents
broadcast messages from being lost when some parties haven't subscribed yet.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 10:04:10 -08:00
hailin 0e8dff0371 test: update wallet username for signing test
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 10:01:56 -08:00
hailin 98731cc133 debug: add more logging to message broker for broadcast diagnostics
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 09:57:34 -08:00
hailin c257ad1639 test: update test_signing.go with new wallet username
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 09:52:58 -08:00
hailin 378970048b debug: add TSS signing debug logs to diagnose stuck issue
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 09:41:31 -08:00
hailin f70ece0d4f test: update test_signing.go to use current wallet username
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 09:33:58 -08:00
hailin fd74bc825a chore: add detailed logging for keygen_session_id tracing
Add logging at key points to trace keygen_session_id flow:
- Account Handler: log keygen_session_id when creating signing session
- Session Coordinator: log keygen_session_id in CreateSession and JoinSession
- Message Router: log keygen_session_id when proxying JoinSession
- Server Party: log keygen_session_id when joining session

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 09:21:19 -08:00
hailin a1b2b760ab feat(migration): add keygen_session_id column to mpc_sessions table
For sign sessions, this column stores the reference to the keygen session
whose key shares should be used for signing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-06 09:16:31 -08:00
hailin 3d176e1132 feat: complete keygen_session_id implementation for signing sessions
- Regenerate protobuf Go code with KeygenSessionId fields
- Session Coordinator correctly parses, stores, and returns keygen_session_id
- Message Router Client parses keygen_session_id in JoinSession response
- participate_signing.go uses keygen_session_id for precise share lookup
- Database schema already includes keygen_session_id column

This fixes the signing issue where wrong keyshares were loaded for multi-account scenarios.
2025-12-06 08:57:30 -08:00
hailin 23eff00d76 feat: add KeygenSessionID to MPCSession entity
- Add KeygenSessionID field to MPCSession struct for tracking which keygen's shares to use
- This is the first step in完整的修复流程
2025-12-06 08:40:38 -08:00
hailin 382386733d feat: add keygen_session_id to signing session flow
- Add keygen_session_id field to CreateSessionRequest and SessionInfo protobuf
- Modify CreateSigningSessionAuto to accept and pass keygenSessionID
- Update Account Handler to pass account's keygen_session_id when creating signing session
- This enables parties to load the correct keyshare by session ID
2025-12-06 08:39:40 -08:00
hailin 7660868a38 fix(account): select t+1 parties for threshold signing
TSS threshold semantics: for threshold parameter t, the required number of signers is t+1.
For 2-of-3 with t=2, we need 2+1=3 signers (all parties must participate).

Previous error: 't+1=3 is not satisfied by the key count of 2'
Fix: Changed from selecting t parties to selecting t+1 parties.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 07:46:32 -08:00
hailin 0ea64e02ae fix(account): use only threshold_t parties for signing instead of all active parties
For 2-of-3 threshold signing, only 2 parties should participate in signing, not all 3. This fixes the 'failed to calculate Bob_mid' error that occurred when all parties tried to sign.

Changes:
- Modified CreateSigningSession to select exactly threshold_t parties when no signing config exists
- For 2-of-3: now selects 2 parties instead of all 3
- Added logging to show party selection details

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 07:35:03 -08:00
hailin 672b6e1630 feat(schema): make email field optional in accounts table
Only username is required, all other fields (email, phone, public_key, etc.) are now optional.

Changes:
- Modified 001_init_schema.up.sql to remove NOT NULL constraints
- Added partial unique index for email (only for non-NULL values)
- Created migration 006_make_email_optional for existing databases
- Set default status to 'active'

This allows automatic account creation from keygen without requiring user info.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 07:16:34 -08:00
hailin eb63b9341b fix(tss): correct threshold signing to support t-of-n properly
Previously, signing incorrectly required all n parties from keygen to participate. For 2-of-3 threshold, it required all 3 parties instead of just 2.

Root cause: tss.NewParameters was using len(currentSigners) instead of the original n from keygen.

Changes:
- Added TotalParties field to SigningConfig to store original n from keygen
- Modified participate_signing.go to read threshold_n from database
- Updated tss.NewParameters to use TotalParties instead of current signer count
- Added logging to show t, n, and current_signers

For 2-of-3: threshold_t=2, threshold_n=3, any 2 parties can now sign.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 07:16:24 -08:00
hailin 6fdd2905b1 test(signing): add signing session test script
Created test_signing.go to test MPC signing functionality:
- Generates JWT token for authentication
- Creates SHA-256 hash of test message
- Calls POST /api/v1/mpc/sign API
- Tests signing with persistent parties (non-delegate mode)

Usage: go run test_signing.go

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 06:58:54 -08:00
hailin e786219f37 debug(keygen): add detailed logging for message flow tracking
Added comprehensive debug logging to track message conversion and
party index mapping in keygen protocol:

1. Log party index map construction with all participants
2. Log received MPC messages before conversion
3. Log when messages are dropped due to unknown sender
4. Log successful message conversion and TSS forwarding
5. Show known_parties map when dropping messages

This will help identify why delegate party receives messages but
doesn't process them during keygen.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 06:45:23 -08:00
hailin 5344af465b fix(server-party): fix context leak in GetPendingMessages acknowledgment
Fixed the acknowledgment goroutine in GetPendingMessages to use parent
context instead of context.Background(), preventing orphan goroutines
that can't be cancelled.

This completes all context bug fixes:
- server-party-api event handler (commit 450163a)
- server-party event handler (commit 99ff3ac)
- message acknowledgment in SubscribeMessages (commit 450163a)
- message acknowledgment in GetPendingMessages (this commit)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 06:42:07 -08:00
hailin 99ff3ac130 fix(server-party): use parent context in event handler for proper cancellation
- Fixed server-party event handler to use parent context with timeout
- Prevents orphan goroutines when session fails or party exits
- Consistent with server-party-api fix
2025-12-06 06:39:23 -08:00
hailin 450163a94d fix(context): use parent context instead of Background() to allow proper cancellation
- Fixed delegate party event handler to use parent context with timeout
- Fixed message acknowledgment to use parent context
- Prevents orphan goroutines when session fails or party exits
- Resolves system crash after delegate party failure
2025-12-06 06:36:34 -08:00
hailin 3adc091140 fix(docker): add PARTY_ROLE environment variable for server-party-api
Add PARTY_ROLE=delegate environment variable to server-party-api service
to fix nil pointer dereference when determining party role during keygen.

Without this variable, the party defaults to "persistent" role which tries
to access keyShareRepo (nil for delegate parties), causing a panic.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 06:00:28 -08:00
hailin 13e81e37c9 fix(db): update repository to save and load delegate_party_id field
Update session repository to properly handle delegate_party_id column:
- Add delegate_party_id to Save method INSERT and UPDATE statements
- Add DelegatePartyID field to sessionRow struct
- Update FindByUUID, FindByStatus, FindExpired, FindActive SELECT queries
- Update scanSessions method to scan and pass delegate_party_id
- Remove placeholder empty string, now loads actual value from database

This completes the delegate party functionality by ensuring the delegate party ID
is persisted and retrieved correctly from the database.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 05:52:34 -08:00
hailin 391448063f feat(db): add delegate_party_id column to mpc_sessions table
Add delegate_party_id column to track which party is acting as delegate
(generates and returns user share instead of storing it).

Changes:
- Add delegate_party_id VARCHAR(255) column with default empty string
- Add partial index for faster lookups when delegate party is present
- Include up and down migrations

This fixes the issue where delegate party selection worked but the delegate_party
field was not being returned in API responses due to missing database column.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 05:50:01 -08:00
hailin 36e1359f43 fix(session-coordinator): pass PartyComposition from gRPC request to use case
Fixed critical bug where PartyComposition (persistent/delegate party counts) was being sent
by account-service in gRPC request but was not being extracted and passed to the CreateSession
use case, causing delegate party selection to fail.

Changes:
- Extract PartyComposition from protobuf request and pass to CreateSessionInput
- Add logging for party composition values in gRPC handler
- Return delegate_party_id and selected_parties in CreateSessionResponse
- Load session after creation to get delegate party ID

This fixes the issue where require_delegate=true had no effect and all parties selected
were persistent parties instead of 2 persistent + 1 delegate.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 05:38:38 -08:00
hailin c5d3840835 fix(docker-compose): add ACCOUNT_SERVICE_ADDR to session-coordinator
- Add ACCOUNT_SERVICE_ADDR environment variable pointing to account-service:8080
- Fixes "connection refused" error when session-coordinator tries to auto-create account after keygen
- Session-coordinator can now properly call account service to create account records

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 05:21:06 -08:00
hailin b8d66921e0 feat(docker-compose): add PARTY_ID to server-party-api configuration
- Add explicit PARTY_ID environment variable for delegate party
- Set PARTY_ID=delegate-party for server-party-api service
- This ensures the delegate party properly registers to Message Router party pool
- Enables delegate party selection for keygen sessions with require_delegate=true

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 05:17:03 -08:00
hailin 5f12404be4 fix: remove dynamic participant join to fix concurrent party_index assignment
- Remove dynamic participant addition in JoinSession
- Participants must be pre-created in CreateSession
- Add ErrPartyNotInvited error for unauthorized join attempts
- Fix Redis adapter to include version parameter in ReconstructSession
- This fixes VSS verification failures caused by inconsistent party indices
2025-12-06 04:54:40 -08:00
hailin b72268c1ce feat(mpc-system): implement optimistic locking for session updates
Implement version-based optimistic locking to prevent concurrent update conflicts
when multiple parties simultaneously report completion during keygen operations.

Changes:
- Add version column to mpc_sessions table (migration 004)
- Add Version field to MPCSession entity
- Define ErrOptimisticLockConflict error
- Update SessionPostgresRepo.Update() to check version and increment on success
- Add automatic retry logic (max 3 attempts) to ReportCompletionUseCase
- Update Save and all query methods (FindByStatus, FindExpired, etc.) to handle version field

This replaces pessimistic locking (FOR UPDATE) with optimistic locking using
the industry-standard pattern: WHERE version = $n and checking rowsAffected.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 04:16:32 -08:00
hailin 63e00a64f5 fix(test): update JWT secret to match .env configuration
Fixed JWT secret in test_create_session.go to use the same secret key
as configured in .env file, resolving 401 Unauthorized errors during
keygen session creation tests.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 03:54:12 -08:00
hailin 77fa40d27f test(logger): set Development=true to test if it affects debug logging
Changed Development from false to true to test if this is preventing
debug logs from being output. Development mode may affect how the
logger handles different log levels.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 03:41:50 -08:00
hailin 47dd2d1cb5 test(logger): add internal debug test immediately after Build()
Added Log.Debug() and Log.Info() calls immediately after Build()
to test if the logger can output debug logs right after creation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 03:41:00 -08:00
hailin 3a247562ea debug(logger): add AtomicLevel tracking to diagnose level changes
Added debug output to track:
1. AtomicLevel value when created
2. AtomicLevel value after Build()
3. Log.Level() value after Build()

This will help identify if Build() or something else is changing the level.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 03:32:45 -08:00
hailin bfe129da51 test(logger): add debug log test to verify debug level works
Added test debug log immediately after logger initialization.
If debug logging is working, we should see this message.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 03:23:24 -08:00
hailin bac623f63c debug(logger): add detailed debug output for level initialization
Added println statements to trace:
1. Level value after UnmarshalText
2. Logger level after Build()

This will help diagnose why debug level is not being applied.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 03:16:05 -08:00
hailin ac1858a19e fix(logger): remove init() function that was overriding config level
Problem: Logger was always using info level despite MPC_LOGGER_LEVEL=debug
Root cause: The init() function in logger.go was calling InitProduction()
which created a zap.NewProduction() logger with hardcoded info level.
This happened before main() called logger.Init(cfg), so the config was
being ignored.

Solution:
1. Removed init() function to prevent early logger initialization
2. Added zap.ReplaceGlobals() in Init() to ensure config takes effect
3. Removed unused "os" import

References:
- https://pkg.go.dev/go.uber.org/zap
- https://stackoverflow.com/questions/57745017/

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-06 03:07:30 -08:00
hailin 32e3970f34 debug: add logger level debug output 2025-12-06 02:56:26 -08:00
hailin 5764f3d50d chore: set logger level to debug for debugging 2025-12-06 02:42:04 -08:00