开源Agent平台Dify源码剖析系列（五）核心模块core/agent之CotChatAgentRunner

本文深入解析Dify框架core/agent模块，重点剖析CotChatAgentRunner类。作为聊天场景优化的思维链代理运行器，其核心功能是将对话上下文转化为结构化的提示信息，支持多模态输入处理。文章从功能定位、核心方法、构建流程和技术细节四个维度展开，详细解读了系统提示构建、用户查询处理和完整提示链整合三大核心方法。通过实例演示了从用户输入到模型提示的完整构建流程，并总结了该组件的设计亮点

ATM006

1231人浏览 · 2025-07-17 19:36:13

ATM006 · 2025-07-17 19:36:13 发布

每一篇文章都短小精悍，不啰嗦。

笔者寄语

本期介绍Dify框架的核心模块core/agent。接下来我们一起深入剖析core/agent目录下的所有代码，并以通俗易懂的方式解释。我们需要先了解这个目录的完整结构，然后逐个分析关键文件，最后总结整个Agent框架的设计和工作原理。

首先，让我查看core/agent目录的完整结构：

dify/api/core/agent.├── base_agent_runner.py    # Agent框架的基础实现├── cot_agent_runner.py    # Chain of Thought (CoT) Agent Runner的实现├── cot_chat_agent_runner.py    # CoT Chat Agent Runner的实现├── cot_completion_agent_runner.py    # CoT Completion Agent Runner的实现├── entities.py    # 定义了Agent框架中的核心实体和数据结构├── fc_agent_runner.py    # CoT Completion Agent Runner的实现├── __init__.py├── output_parser│   └── cot_output_parser.py  # Chain of Thought输出解析器的实现└── prompt    └── template.py  # Agent提示模板的实现

CotChatAgentRunner 类继承自 CotAgentRunner，是专门为聊天场景优化的思维链代理运行器。其核心职责是构建符合思维链范式的提示模板，将系统指令、用户问题、历史对话和工具信息整合为大语言模型可理解的格式。下面我们从「功能定位→核心方法→构建流程→技术细节」四个层面进行剖析。

一、功能定位：聊天场景下的「提示工程师」

CotChatAgentRunner 的核心作用是将复杂的对话上下文转化为引导模型思考的提示信息。它解决了两个关键问题：

结构化提示构建
将系统指令、工具列表、历史对话等信息组织成特定格式（如 ReACT 范式），引导模型按思维链方式思考。
多模态支持
处理用户上传的文件（如图像），将其转化为模型可理解的格式（如文本描述）。

例如，当用户发送包含图片的问题（如「分析这张销售图表的趋势」）时，CotChatAgentRunner 会：

将图片转换为文本描述（如「一张柱状图，展示了 2024 年 1-6 月各地区销售额」）；
结合系统指令（如「你是数据分析师，请使用工具分析图表数据」）；
历史对话（如用户之前的问题）；
可用工具（如「数据可视化工具」「趋势预测工具」）；
构建完整的提示模板，发送给模型。

二、核心方法：三大提示构建器

类中定义了三个核心方法，分别负责构建系统提示、用户查询和完整提示链：

1. `_organize_system_prompt()`：构建系统指令

def _organize_system_prompt(self)-> SystemPromptMessage:
assert self.app_config.agent and self.app_config.agent.prompt
    first_prompt = self.app_config.agent.prompt.first_prompt

# 填充模板变量
    system_prompt =(
        first_prompt.replace("{{instruction}}", self._instruction)
.replace("{{tools}}", json.dumps(jsonable_encoder(self._prompt_messages_tools)))
.replace("{{tool_names}}",", ".join([tool.name for tool in self._prompt_messages_tools]))
)

return SystemPromptMessage(content=system_prompt)

关键操作：

从配置中获取初始提示模板（如 first_prompt）；
动态填充三个关键变量：
- {{instruction}}
  ：用户指令（如「分析销量趋势」）；
- {{tools}}
  ：可用工具列表的 JSON 字符串（包含工具名称、描述、参数）；
- {{tool_names}}
  ：工具名称的逗号分隔列表（如 sales_dataset, chart_generator）。

示例输出（简化版）：

你是一位专业的数据分析师。可用工具包括：
[ {"name": "sales_dataset", "description": "查询销售数据集", ...}, {"name": "chart_generator", "description": "生成图表", ...} ] 请根据用户需求，合理使用工具解决问题。

2. `_organize_user_query()`：处理用户输入（含多模态）

def _organize_user_query(self, query, prompt_messages):
if self.files:# 如果用户上传了文件（如图像）
        contents =[TextPromptMessageContent(data=query)]# 文本部分

# 获取图像细节配置（如高/中/低描述）
        image_detail_config = self.application_generate_entity.file_upload_config.image_config.detail
        image_detail_config = image_detail_config or ImagePromptMessageContent.DETAIL.LOW

# 将每个文件转换为模型可理解的格式
forfilein self.files:
            contents.append(
                file_manager.to_prompt_message_content(
file, image_detail_config=image_detail_config
)
)

        prompt_messages.append(UserPromptMessage(content=contents))
else:# 纯文本输入
        prompt_messages.append(UserPromptMessage(content=query))

return prompt_messages

关键操作：

处理多模态输入：将用户上传的文件（如图片）转换为 ImagePromptMessageContent 对象，包含图像描述；
图像细节配置：通过 image_detail_config 控制描述的详细程度（如「低」仅包含基本信息，「高」包含像素级细节）。

示例场景：
用户发送问题「分析这张图表」并上传图片 → 方法将图片转换为「一张包含 2024 年各季度销量数据的折线图」，与文本问题合并为多模态输入。

3. `_organize_prompt_messages()`：整合所有提示组件

def _organize_prompt_messages(self)->list[PromptMessage]:
# 1. 构建系统提示
    system_message = self._organize_system_prompt()

# 2. 构建当前助手回复（基于历史思考步骤）
if self._agent_scratchpad:
        assistant_message = AssistantPromptMessage(content="")
for unit in self._agent_scratchpad:
if unit.is_final():
                assistant_message.content +=f"Final Answer: {unit.agent_response}"
else:
                assistant_message.content +=f"Thought: {unit.thought}\n\n"
if unit.action_str:
                    assistant_message.content +=f"Action: {unit.action_str}\n\n"
if unit.observation:
                    assistant_message.content +=f"Observation: {unit.observation}\n\n"
        assistant_messages =[assistant_message]
else:
        assistant_messages =[]

# 3. 构建用户查询
    query_messages = self._organize_user_query(self._query,[])

# 4. 整合历史消息和当前消息
if assistant_messages:
        historic_messages = self._organize_historic_prompt_messages(
[system_message,*query_messages,*assistant_messages, UserPromptMessage(content="continue")]
)
        messages =[system_message,*historic_messages,*query_messages,*assistant_messages, UserPromptMessage(content="continue")]
else:
        historic_messages = self._organize_historic_prompt_messages([system_message,*query_messages])
        messages =[system_message,*historic_messages,*query_messages]

return messages

关键操作：

思维链格式化
：将历史思考步骤（_agent_scratchpad）转换为 ReACT 格式的文本（如 Thought: ... Action: ... Observation: ...）；
历史对话整合
：通过 _organize_historic_prompt_messages 方法，将历史消息转换为模型可理解的格式，并添加 continue 标记，引导模型继续思考；
多轮对话支持
：在多轮迭代中，动态更新 assistant_messages，保留完整的思考链条。

三、构建流程：从用户输入到模型提示

以用户提问「分析近 3 个月销量趋势」为例，看 CotChatAgentRunner 如何构建提示：

系统提示（简化版）：

你是一位专业的数据分析师。可用工具包括：["sales_dataset", "chart_generator"]。

请按以下格式回答：
Thought: 我需要...
Action: {"name": "工具名", "parameters": {"param1": "value"}}
Observation: 工具返回的结果
...（重复思考-行动-观察）
Final Answer: 最终答案

用户查询：

User: 分析近3个月销量趋势

第一轮思考（假设模型输出）：

Assistant:

Thought: 我需要先获取近3个月的销量数据，使用sales_dataset工具。
Action: {"name": "sales_dataset", "parameters": {"time_range": "近3个月"}}

工具调用与观察：
- 系统调用 sales_dataset 工具，获取数据；
- 将结果作为 Observation 添加到思考链：
  Observation: 近3个月销量数据：1月1000，2月1200，3月1300，呈上升趋势。
第二轮思考（模型基于观察继续思考）：

Thought: 已获取数据，销量呈上升趋势。需要生成图表可视化，使用chart_generator工具。
```
Action: {"name": "chart_generator", "parameters": {"data": [...]}}
```
最终答案：

Final Answer: 近3个月销量呈上升趋势（1月1000→2月1200→3月1300）。已为您生成趋势图（见附件）。

四、技术细节与设计亮点

多模态兼容性：
- 通过 file_manager.to_prompt_message_content 方法，将不同类型的文件（如图像、文档）转换为模型可理解的格式，支持跨模态推理。
类型安全：
- 使用 assert isinstance(assistant_message.content, str) 确保类型安全，避免在字符串拼接时出现类型错误；
- 通过 jsonable_encoder 处理复杂对象的 JSON 序列化，确保工具列表正确转换为字符串。
迭代优化：
- 在多轮迭代中，通过 UserPromptMessage(content="continue") 引导模型继续思考，而不是重新开始；
- 动态更新工具列表（如最后一轮迭代时移除所有工具，强制模型输出最终答案）。
模板灵活性：
- 使用字符串替换（replace）而非硬编码，允许通过配置文件自定义提示模板，适应不同场景需求。