it0/packages/services
hailin 0d5441f720 fix(wecom): token mutex, leader lease, backoff, watchdog
Four additional robustness fixes:

1. **Token refresh mutex** — tokenRefreshPromise deduplicates concurrent
   refresh calls. All callers share one in-flight HTTP request instead
   of each firing their own, eliminating the race condition.

2. **Distributed leader lease** — service_state table used for a
   TTL-based leader election (LEADER_LEASE_TTL_S=90s). Only one
   agent-service instance polls at a time; others skip until the lease
   expires. Lease auto-released on graceful shutdown.

3. **Exponential backoff** — consecutive poll errors increment a counter;
   next delay = min(10s × 2^(n-1), 5min). Prevents log spam and
   reduces load during sustained WeCom API outages. Counter resets on
   any successful poll.

4. **Watchdog timer** — setInterval every 2min checks lastPollAt.
   If poll loop has been silent for >5min, clears the timer and
   reschedules immediately, recovering from any silent crash.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 05:54:42 -07:00
..
agent-service fix(wecom): token mutex, leader lease, backoff, watchdog 2026-03-10 05:54:42 -07:00
audit-service fix(auth): allow platform_admin to access all web-admin endpoints 2026-03-07 05:54:05 -08:00
auth-service fix(auth): add name to JWT payload, fix phone-user session restore 2026-03-09 08:23:26 -07:00
billing-service fix: store tenant slug (not UUID) in current_tenant; remove plan trial periods 2026-03-07 09:01:21 -08:00
comm-service fix: release QueryRunner connections to prevent pool exhaustion 2026-02-23 15:55:06 -08:00
inventory-service feat(openclaw): Phase 1 — server pool + agent instance deployment infrastructure 2026-03-07 11:11:21 -08:00
monitor-service fix: release QueryRunner connections to prevent pool exhaustion 2026-02-23 15:55:06 -08:00
notification-service fix(push): fix TypeScript Map type inference error in OfflinePushService 2026-03-10 04:50:11 -07:00
ops-service fix(ops-service): add new TenantInfo quota fields to inline TenantContextService.run calls 2026-03-04 00:04:36 -08:00
presence-service fix(presence-service): use linux-musl-openssl-3.0.x Prisma binary target for Alpine 2026-03-07 18:19:13 -08:00
referral-service feat(referral): add user-level personal circle + points system 2026-03-08 00:18:17 -08:00
version-service feat(auth): add platform_super_admin role for two-level platform access control 2026-03-07 01:17:27 -08:00
voice-agent feat(voice): randomly pick thinking sound from all 7 built-in clips per session 2026-03-09 06:56:31 -07:00
voice-service feat: add engine type selection (Agent SDK / Claude API) for voice calls 2026-03-02 02:11:51 -08:00