FileType.custom with allowedExtensions causes Android system picker
to hide subdirectories on some devices. Changed to FileType.any with
post-selection extension validation instead.
- Unsupported file types are skipped with a SnackBar hint
- Allowed: jpg, jpeg, png, gif, webp, pdf
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- AppBar background transparent, merges with scaffold for seamless look
- toolbarHeight reduced from 64dp to 44dp (~20dp screen space saved)
- scrolledUnderElevation: 0 prevents Material 3 shadow on scroll
- Icons 24→20px with VisualDensity.compact for tighter action buttons
- Title fontSize 16 w600, less visual weight
- Both dark and light themes updated consistently
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Input area redesign (ChatGPT/Claude App style):
- Replace fixed bottom bar with floating pill overlay using Stack+Positioned
- Semi-transparent background (surface 92% opacity) with rounded corners (28px)
- Drop shadow for depth separation from content
- Remove inner TextField border (InputBorder.none) for cleaner look
- ListView bottom padding increased to 80px to leave room under the pill
- Input pill floats 12px from edges, 8px from bottom
History scroll fix:
- Add jump parameter to _scrollToBottom() for instant positioning
- When loading conversation history (empty→many messages), use jumpTo
instead of animateTo to avoid incomplete scroll on large message lists
- Double-frame jumpTo ensures layout settles before final scroll position
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove redundant "从剪贴板粘贴" option from attachment menu (long-press to paste natively)
- Remove super_clipboard dependency (no longer needed)
- Fix timeline vertical line overlapping icon nodes by using dynamic dotRadius
- Dim input field placeholder color to AppColors.textMuted
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
getFile requires two positional args: format and callback.
Wrapped in Completer for async/await usage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add super_clipboard and file_picker dependencies
- Clipboard paste: reads PNG/JPEG image data from system clipboard
- Multi-image: pickMultiImage with remaining count limit
- File picker: supports images (jpg/png/gif/webp) and PDF files
- Updated attachment preview to show file icon for non-image types
- Bottom sheet now shows 4 options: gallery, camera, clipboard, file
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two major features in this commit:
1. Streaming Markdown Rendering Optimization
- Replace deprecated flutter_markdown with gpt_markdown (active, AI-optimized)
- Real-time markdown rendering during streaming (was showing raw syntax)
- Solid block cursor (█) instead of AnimationController blink
- 80ms token throttle buffer reducing rebuilds from per-token to ~12.5/sec
- RepaintBoundary isolation for markdown widget repaints
- StreamTextWidget simplified from StatefulWidget to StatelessWidget
2. Multimodal Image Input (camera + gallery + display)
- Flutter: image_picker for gallery/camera, base64 encoding, attachment
preview strip with delete, thumbnails in sent messages
- Data layer: List<String>? → List<Map<String, dynamic>>? for structured
attachment payloads through datasource/repository/usecase
- ChatAttachment model with base64Data, mediaType, fileName
- ChatMessage entity + ChatMessageModel both support attachments field
- Backend DTO, Entity (JSONB), Controller, ConversationContextService
all extended to receive, store, and reconstruct Anthropic image
content blocks in loadContext()
- Claude API engine skips duplicate user message when history already
ends with multimodal content blocks
- NestJS body parser limit raised to 10MB for base64 image payloads
- Android CAMERA permission added to manifest
- Image.memory uses cacheWidth/cacheHeight for memory efficiency
- Max 5 images per message enforced in UI
Data flow:
ImagePicker → base64Encode → ChatAttachment → POST body →
DB (JSONB) → loadContext → Anthropic image content blocks → Claude API
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three root causes fixed:
1. TimelineEventNode: Replaced IntrinsicHeight (which forces intrinsic
height calculation on unbounded content) with CustomPaint-based
_TimelineLinePainter that draws vertical lines based on actual
rendered widget size. Also added maxLines/ellipsis to label text
and mainAxisSize.min on inner Column.
2. ApprovalActionCard: Changed countdown + action buttons layout from
Row with Spacer (which requires infinite width) to Wrap with
spacing, preventing horizontal overflow on narrow screens.
3. AnimatedCrossFade in _CollapsibleCodeBlock and _CollapsibleThinking:
Wrapped with ClipRect and added sizeCurve: Curves.easeInOut to
prevent the outgoing child from extending beyond parent bounds
during the cross-fade transition.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6 rounds of systematic audit identified and fixed 14 bugs across
backend controller and Flutter client:
## Backend (agent.controller.ts)
Security & Tenant Isolation:
- Add @TenantId + ForbiddenException check to cancelTask, injectMessage,
approveCommand — all 4 write endpoints now enforce tenant isolation
- Add tenantId check on session reuse in executeTask to prevent
cross-tenant session hijacking
Architecture & Correctness:
- Extract shared runTaskStream() from inline fire-and-forget block,
used by both executeTask and injectMessage to reduce duplication
- Use session.engineType (not getActiveEngine()) in cancelTask,
injectMessage, approveCommand — fixes wrong-engine-cancel when
global engine config is switched after task creation
- Add concurrent task prevention: executeTask checks for existing
RUNNING task on same session and cancels it before starting new one
- Add runningTasks Map to track task promises, awaitTaskCleanup()
helper with 3s timeout for inject to wait for partial text save
- captureSdkSessionId() captures SDK session ID into metadata
without DB save (callers persist), preventing fire-and-forget race
Cancel/Reject Improvements:
- cancelTask: idempotent (returns early if already CANCELLED/COMPLETED),
session stays 'active' (was 'cancelled'), emits cancelled WS event
- approveCommand reject: session stays 'active' (was 'cancelled'),
now emits cancelled WS event so Flutter stream listeners clean up
- approveCommand approved: collect text events and save assistant
response to conversation history on completion (was missing)
Minor:
- task.result! non-null assertion → task.result ?? 'Unknown error'
- Add findRunningBySessionId() to TaskRepository
## Flutter
API Contract Fix:
- approveCommand: route changed from /api/v1/ops/approvals/:id/approve
to /api/v1/agent/tasks/:id/approve with {approved: true} body
- rejectCommand: route changed from /api/v1/ops/approvals/:id/reject
to /api/v1/agent/tasks/:id/approve with {approved: false} body
Resource Management:
- ChatNotifier.dispose() now disconnects WebSocket to prevent
connection leak when navigating away from chat
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend (agent-engine.port.ts):
- Add `cancelled` event type: emitted when a task is cancelled (user-initiated
or injection), so Flutter can close the old stream cleanly
- Add `task_info` event type: emitted after inject to pass the new taskId to
the client, enabling cancel/re-inject on the replacement task
Flutter (features/chat/):
- ChatState: track current `taskId` alongside `sessionId`; clear on completion
or error
- Handle `TaskInfoEvent`: update taskId in state when server issues a new task
- Handle `CancelledEvent`: treat as stream termination (agentStatus → idle)
- MessageType.interrupted: new UI node (warning style) for mid-stream cancels
- _inject(): send text as an inject request while streaming; backend cancels
the current task and starts a new one with the injected message
- Input area: during streaming, hint changes to "追加指令...", Enter key calls
_inject() instead of _send(), and both inject-send + stop buttons are shown
- isAwaitingApproval kept separate from isStreaming so approval flow is not
blocked by inject mode
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Server side (session_router.py):
- /reconnect now accepts sessions in "active" state (not just "disconnected")
- When client reconnects to an active session, the old WebSocket/pipeline is
automatically replaced when the new WebSocket connects
- Only truly terminal states (e.g. "ended") return 409
Flutter side (agent_call_page.dart):
- Distinguish terminal errors (404 session gone, 409 ended) from transient
errors (network timeout, server unreachable) in reconnect loop
- Terminal errors break immediately instead of wasting retry attempts
- Extract _connectWebSocket() helper for cleaner reconnect flow
- Add DioException handling for proper HTTP status code inspection
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FlutterSoundPlayer.feedUint8FromStream() requires interleaved mode.
With interleaved=false, every feed() call threw:
"Cannot feed with UInt8 with non interleaved mode"
- feedUint8FromStream (Uint8List) → requires interleaved: true
- feedFromStream (Float32List) → requires interleaved: false
Since we feed raw PCM bytes (Uint8List), interleaved must be true.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause: PcmPlayer called openPlayer() without audio session config,
so Android defaulted to earpiece-only mode. When the mic was actively
recording, playback was silently suppressed — the agent's TTS audio was
sent successfully over WebSocket but never reached the speaker.
Changes:
1. PcmPlayer (pcm_player.dart):
- Added audio_session package for proper audio session management
- Configure AudioSession with playAndRecord category so mic + speaker
work simultaneously
- Set voiceCommunication usage to enable Android hardware AEC (echo
cancellation) — prevents feedback loops when speaker is active
- defaultToSpeaker routes output to loudspeaker instead of earpiece
- Restored setSpeakerOn() method stub (used by UI toggle)
2. AgentCallPage (agent_call_page.dart):
- Fixed fire-and-forget bug: _pcmPlayer.feed() returns Future but was
called without await, causing interleaved feedUint8FromStream calls
- Added _feedChain serializer to guarantee sequential audio feeding
3. Dependencies:
- Added audio_session package to pubspec.yaml
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Local /synthesize and /transcribe endpoints now auto-load Kokoro/Whisper
models on first call instead of returning 503 when not pre-loaded at
startup. This allows switching between Local and OpenAI providers in the
Flutter test page without requiring server restart.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause: IOWebSocketChannel.sink.close() can hang indefinitely
(dart-lang/web_socket_channel#185). Previous fix used unawaited close
but didn't cancel the stream subscription, so the old listener could
still push events to _messageController.
Fix: Extract _closeCurrentConnection() that:
1. Cancels StreamSubscription first (stops duplicate events immediately)
2. Fire-and-forget sink.close(goingAway) (frees underlying socket)
This follows the workaround recommended in the official issue tracker.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The await on sink.close() blocks indefinitely when the server doesn't
respond to the close handshake. Use fire-and-forget with unawaited()
so the new connection can proceed immediately.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When sending a second message in the same session, the old WebSocket
connection was not closed, causing both connections to subscribe to the
same session room. This resulted in each text event being received twice,
producing garbled/duplicated output text.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement DB-based conversation message storage (engine-agnostic) that
works across both Claude API and Agent SDK engines. Add ChatGPT/Claude-style
conversation history drawer in Flutter with date-grouped session list,
session switching, and new chat functionality.
Backend: entity, repository, context service, migration 004, session/message
API endpoints. Flutter: ConversationDrawer, sessionId flow from backend
response via SessionInfoEvent, session list/switch/delete support.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove shared interceptors from the binary Dio instance to prevent
request dedup/retry interceptors from interfering with audio downloads.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hardcoded Colors.grey with Theme.of(context).colorScheme for
result containers and status text so they're readable in both light
and dark themes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add openai package to voice-service requirements
- Add /api/v1/test/tts/synthesize-openai (tts-1/tts-1-hd/gpt-4o-mini-tts)
- Add /api/v1/test/stt/transcribe-openai (gpt-4o-transcribe/whisper-1)
- Add OPENAI_API_KEY and OPENAI_BASE_URL env vars to voice-service
- Flutter test page: SegmentedButton to toggle Local/OpenAI provider
- All endpoints maintain same response format for easy comparison
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove record package dependency, use FlutterSoundRecorder instead
- Use permission_handler for microphone permission (already in pubspec)
- Proper temp file path via path_provider
- Cleanup temp files after upload
- Single package (flutter_sound) handles both recording and playback
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The record package requires a valid file path. Empty string caused
ENOENT (No such file or directory) on Android.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Claude API engine now uses streaming API (messages.stream) for real-time
text delta output instead of waiting for full response
- Agent controller accepts optional engineType body parameter to allow
callers (e.g. voice pipeline) to select a specific engine
- Fix voice_test_page.dart compilation error: replace audioplayers (not
installed) with flutter_sound (already in pubspec.yaml)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Thinking content auto-expands while streaming, auto-collapses when done.
User can toggle with "Thinking ∨" button, matching Claude Code VSCode UX.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause: Pipecat's WebsocketServerTransport creates its own WebSocket
server on (host,port) and expects FrameProcessor subclasses. Our code was
passing a FastAPI WebSocket object as 'host' and using plain STT/TTS/VAD
service classes that aren't FrameProcessors. The pipeline crashed immediately
when receiving audio, causing "disconnects when speaking".
Changes:
- **base_pipeline.py**: Complete rewrite — replaced Pipecat Pipeline with
direct async loop: WebSocket → VAD → STT → Claude LLM → TTS → WebSocket.
Supports barge-in (interrupt TTS when user speaks), audio chunking, and
24kHz→16kHz TTS resampling.
- **session_router.py**: Pass WebSocket directly to pipeline instead of
wrapping in AppTransport.
- **app_transport.py**: Deprecated (no longer needed).
- **kokoro_service.py**: Fix misaki compatibility (MutableToken→MToken
rename), use correct Chinese voice 'zf_xiaoxiao', handle torch tensors.
- **main.py**: Apply misaki monkey-patch before importing kokoro.
- **settings.py**: Change default TTS voice from 'zh_female_1' (non-existent)
to 'zf_xiaoxiao' (valid Kokoro-82M Chinese female voice).
- **requirements.txt**: Remove pipecat-ai dependency, pin kokoro==0.3.5 +
misaki==0.7.17, add Chinese NLP deps (pypinyin, cn2an, jieba, ordered-set).
- **agent_call_page.dart**: Wrap each cleanup step in try/catch to ensure
Navigator.pop() always executes after call ends. Add 3s timeout on session
delete request.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace traditional chat bubble layout with a Claude Code-inspired
timeline/workflow design:
- Vertical gray line connecting sequential event nodes
- Colored dots for each event (green=done, red=error, yellow=warning)
- Animated spinning asterisk (*) on active nodes
- Streaming text with blinking cursor in timeline nodes
- Tool execution shown as code blocks within timeline
- User prompts as distinct nodes with person icon
New file: timeline_event_node.dart (TimelineEventNode, CodeBlock)
Rewritten: chat_page.dart (timeline layout, no more bubbles)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend:
- Add includePartialMessages: true to SDK query options
- Handle stream_event/content_block_delta for real-time text streaming
- Skip text/thinking blocks from complete assistant messages (already
streamed via deltas) to avoid duplication
- Change default result summary to empty string
Flutter:
- Only show CompletedEvent summary when no assistant text was streamed
(prevents duplicate message bubble)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WebSocketChannel.connect does not accept headers parameter in
web_socket_channel 2.4.0. Use IOWebSocketChannel.connect instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WebSocket connections to /ws/agent were rejected by Kong (401)
because the Authorization header was not included. Now reads
access_token from secure storage and passes it in the WebSocket
upgrade request headers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses reliability gaps in the real-time voice WebSocket connection
between Flutter client and Python voice-service backend.
Backend (voice-service):
- Heartbeat: new _heartbeat_sender coroutine sends JSON ping text frames
every 15s alongside the Pipecat pipeline; failed send = dead connection
- Session preservation: on WebSocket disconnect, sessions are now marked
"disconnected" with a timestamp instead of being deleted, allowing
reconnection within a configurable TTL (default 60s)
- Reconnect endpoint: POST /sessions/{id}/reconnect verifies the session
is alive and in "disconnected" state, returns fresh websocket_url
- Reconnect-aware WS handler: detects "disconnected" sessions, cancels
stale pipeline tasks, creates a new pipeline, sends "session.resumed"
- Background cleanup: asyncio loop every 30s removes sessions that have
been disconnected longer than session_ttl
- Structured event protocol: text frames = JSON control messages
(ping/pong/session.resumed/session.ended/error), binary = PCM audio
- New settings: session_ttl (60s), heartbeat_interval (15s),
heartbeat_timeout (45s)
Flutter (agent_call_page.dart):
- Heartbeat monitoring: tracks last server ping timestamp, triggers
reconnect if no ping received in 45s (3 missed intervals)
- Auto-reconnect: exponential backoff (1s→2s→4s→8s→16s), max 5 attempts;
calls /reconnect endpoint to verify session, rebuilds WebSocket,
resets audio buffer, restarts heartbeat
- Reconnecting UI: yellow warning banner "重新连接中... (N/5)" with
spinner overlay during reconnection attempts
- WebSocket data routing: _onWsData distinguishes String (JSON control)
from binary (audio) frames, handles ping/session.resumed/session.ended
- User-initiated disconnect guard: _userEndedCall flag prevents reconnect
attempts when user intentionally hangs up
- session_id field compatibility: supports session_id/sessionId/id
Flutter (pcm_player.dart):
- Jitter buffer: queues incoming PCM chunks, starts playback only after
accumulating 4800 bytes (150ms at 16kHz 16-bit mono) to smooth out
network timing variance
- reset() method: clears buffer on reconnect to discard stale audio
- Buffer underrun handling: re-enters buffering phase if queue empties
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace traditional on-device speech_to_text with a modern pipeline:
- Record audio via `record` package with hardware noise suppression
- Apply GTCRN neural denoising (sherpa-onnx, ICASSP 2024, 48K params)
- Trim silence, POST to backend /voice/transcribe (faster-whisper)
Changes:
- Add /transcribe endpoint to voice-service for audio file upload
- Add SpeechEnhancer wrapper for sherpa-onnx GTCRN model (523KB)
- Rewrite chat_page.dart voice input: record → denoise → transcribe
- Keep NoiseReducer.trimSilence for silence removal only
- Upgrade record to v6.2.0, add sherpa_onnx, path_provider
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Web Admin:
- Update browser title to "iAgent Admin Console"
- Translation: en appTitle "iAgent", zh appTitle "我智能体"
- Sidebar: en "iAgent Admin", zh "我智能体"
- Settings placeholder: "iAgent Platform" / "我智能体平台"
- Update alt tags on logo images
Flutter:
- MaterialApp title: "iAgent"
- Chat: "Ask iAgent..." / "Start a conversation with iAgent"
- Terminal: "iAgent Remote Terminal"
- Agent call: "iAgent Calling" / "iAgent"
Logo SVG: text changed from "AI AGENT" to "iAgent"
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add logo.svg (green robot character) to project root
- Web admin: replace text "IT" badge with SVG logo in sidebar
- Web admin: add logo image to login, register, invite pages
- Web admin: add SVG favicon and apple-touch-icon metadata
- Flutter: add flutter_svg dependency, replace text "IT0" with logo on login page
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>