fix: move voice instructions to systemPrompt, keep prompt clean

Previously, voice mode wrapped every user message with 【语音对话模式】 instructions, polluting conversation_messages history with repeated instructions on every turn. Now: - systemPrompt carries voice-mode instructions (set once, not per-message) - prompt contains only the clean user text (identical to text chat pattern) - Conversation history stays clean for multi-turn context Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 03:24:50 -08:00 · 2026-03-02 03:24:50 -08:00 · 02aaf40bb2
parent da17488389
commit 02aaf40bb2
1 changed files with 13 additions and 14 deletions
--- a/packages/services/voice-agent/src/plugins/agent_llm.py
+++ b/packages/services/voice-agent/src/plugins/agent_llm.py
@ -211,29 +211,28 @@ class AgentServiceLLMStream(llm.LLMStream):

            # 2. Create agent task (with timeout)
            engine_type = self._llm_instance._engine_type
-            prompt = user_text
-
-            # Agent SDK mode: instruct the agent to output concise spoken Chinese
-            # (skip tool-call details and intermediate steps)
-            if engine_type == "claude_agent_sdk":
-                prompt = (
-                    "【语音对话模式】你正在通过语音与用户实时对话。请严格遵守以下规则：\n"
-                    "1. 只输出用户关注的最终答案，不要输出工具调用过程、中间步骤或技术细节\n"
-                    "2. 用简洁自然的口语中文回答，像面对面对话一样\n"
-                    "3. 回复要简短精炼，适合语音播报，通常1-3句话即可\n"
-                    "4. 不要使用markdown格式、代码块、列表符号等文本格式\n"
-                    f"\n用户说：{user_text}"
-                )

            # Voice mode flag: tell agent-service to filter intermediate events
            # (tool_use, tool_result, thinking) — only stream text + completed + error
            voice_mode = engine_type == "claude_agent_sdk"

            body: dict[str, Any] = {
-                "prompt": prompt,
+                "prompt": user_text,  # always send clean user text (no wrapping)
                "engineType": engine_type,
                "voiceMode": voice_mode,
            }
+
+            # Agent SDK mode: set systemPrompt once (not per-message) so
+            # conversation history stays clean — identical to text chat pattern
+            if voice_mode:
+                body["systemPrompt"] = (
+                    "你正在通过语音与用户实时对话。请严格遵守以下规则：\n"
+                    "1. 只输出用户关注的最终答案，不要输出工具调用过程、中间步骤或技术细节\n"
+                    "2. 用简洁自然的口语中文回答，像面对面对话一样\n"
+                    "3. 回复要简短精炼，适合语音播报，通常1-3句话即可\n"
+                    "4. 不要使用markdown格式、代码块、列表符号等文本格式"
+                )
+
            if self._llm_instance._agent_session_id:
                body["sessionId"] = self._llm_instance._agent_session_id