prompts.py 中的提示词模板详解
文件中定义了两个核心提示词模板:REASON_PROMPT
和 RELEVANT_EXTRACTION_PROMPT
。这两个模板在 DeepResearcher 的推理过程中扮演着关键角色。下面我将详细解析这两个模板的结构和功能。
REASON_PROMPT 详解
REASON_PROMPT
是用于指导语言模型进行推理和搜索的主要提示词。它的设计非常精巧,包含多个关键部分:
1. 角色与能力定义
"You are a reasoning assistant with the ability to perform dataset searches to help "
"you answer the user's question accurately. You have special tools:\n\n"
这部分明确定义了语言模型的角色(推理助手)和核心能力(执行数据集搜索)。通过明确的角色定位,帮助模型理解其应该如何行动。
2. 工具使用说明
f"- To perform a search: write {BEGIN_SEARCH_QUERY} your query here {END_SEARCH_QUERY}.\n"
f"Then, the system will search and analyze relevant content, then provide you with helpful information in the format {BEGIN_SEARCH_RESULT} ...search results... {END_SEARCH_RESULT}.\n\n"
f"You can repeat the search process multiple times if necessary. The maximum number of search attempts is limited to {MAX_SEARCH_LIMIT}.\n\n"
这部分详细说明了如何使用搜索工具:
- 使用特定标记包装搜索查询
- 系统将返回的结果格式
- 可以多次搜索的规则
- 搜索次数的限制
通过明确的工具使用说明,模型知道如何正确格式化其输出以触发搜索功能。
3. 示例学习 - 示例1
这是一个复杂问题的完整示例,展示了如何通过多轮搜索解决比较类问题:
"-- Example 1 --\n"
"Question: \"Are both the directors of Jaws and Casino Royale from the same country?\"\n"
"Assistant:\n"
f" {BEGIN_SEARCH_QUERY}Who is the director of Jaws?{END_SEARCH_QUERY}\n\n"
"User:\n"
f" {BEGIN_SEARCH_RESULT}\nThe director of Jaws is Steven Spielberg...\n{END_SEARCH_RESULT}\n\n"
"Continues reasoning with the new information.\n"
"Assistant:\n"
f" {BEGIN_SEARCH_QUERY}Where is Steven Spielberg from?{END_SEARCH_QUERY}\n\n"
"User:\n"
f" {BEGIN_SEARCH_RESULT}\nSteven Allan Spielberg is an American filmmaker...\n{END_SEARCH_RESULT}\n\n"
"Continues reasoning with the new information...\n\n"
"Assistant:\n"
f" {BEGIN_SEARCH_QUERY}Who is the director of Casino Royale?{END_SEARCH_QUERY}\n\n"
"User:\n"
f" {BEGIN_SEARCH_RESULT}\nCasino Royale is a 2006 spy film directed by Martin Campbell...\n{END_SEARCH_RESULT}\n\n"
"Continues reasoning with the new information...\n\n"
"Assistant:\n"
f" {BEGIN_SEARCH_QUERY}Where is Martin Campbell from?{END_SEARCH_QUERY}\n\n"
"User:\n"
f" {BEGIN_SEARCH_RESULT}\nMartin Campbell (born 24 October 1943) is a New Zealand film and television director...\n{END_SEARCH_RESULT}\n\n"
"Continues reasoning with the new information...\n\n"
"Assistant:\nIt's enough to answer the question\n"
这个示例展示了:
- 如何将复杂问题分解为多个简单查询
- 如何基于前一步的结果构建下一步的查询
- 如何在获取足够信息后停止搜索
- 正确的标记使用方式
4. 示例学习 - 示例2
这是一个较简单问题的示例,展示了如何通过两轮搜索解决事实查询问题:
"-- Example 2 --\n"
"Question: \"When was the founder of craigslist born?\"\n"
"Assistant:\n"
f" {BEGIN_SEARCH_QUERY}Who was the founder of craigslist?{END_SEARCH_QUERY}\n\n"
"User:\n"
f" {BEGIN_SEARCH_RESULT}\nCraigslist was founded by Craig Newmark...\n{END_SEARCH_RESULT}\n\n"
"Continues reasoning with the new information.\n"
"Assistant:\n"
f" {BEGIN_SEARCH_QUERY} When was Craig Newmark born?{END_SEARCH_QUERY}\n\n"
"User:\n"
f" {BEGIN_SEARCH_RESULT}\nCraig Newmark was born on December 6, 1952...\n{END_SEARCH_RESULT}\n\n"
"Continues reasoning with the new information...\n\n"
"Assistant:\nIt's enough to answer the question\n"
这个示例展示了:
- 如何处理需要两步推理的简单问题
- 如何在第一步获取实体信息后,在第二步查询该实体的具体属性
5. 注意事项与提醒
"**Remember**:\n"
f"- You have a dataset to search, so you just provide a proper search query.\n"
f"- Use {BEGIN_SEARCH_QUERY} to request a dataset search and end with {END_SEARCH_QUERY}.\n"
"- The language of query MUST be as the same as 'Question' or 'search result'.\n"
"- If no helpful information can be found, rewrite the search query to be less and precise keywords.\n"
"- When done searching, continue your reasoning.\n\n"
'Please answer the following question. You should think step by step to solve it.\n\n'
这部分提供了重要的使用提醒:
- 强调模型的角色是提供搜索查询,而不是直接回答
- 重申标记的正确使用方式
- 强调查询语言需要与问题或搜索结果语言一致
- 提供查询优化策略
- 指导在完成搜索后继续推理
- 鼓励步骤化思考
RELEVANT_EXTRACTION_PROMPT 详解
RELEVANT_EXTRACTION_PROMPT
是用于从检索到的文档中提取相关信息的提示词。它的结构更加正式,采用了任务指导的形式:
1. 任务说明
"""Task Instruction:
You are tasked with reading and analyzing web pages based on the following inputs: **Previous Reasoning Steps**, **Current Search Query**, and **Searched Web Pages**. Your objective is to extract relevant and helpful information for Current Search Query from the Searched Web Pages and seamlessly integrate this information into the Previous Reasoning Steps to continue reasoning for the original question.
这部分明确定义了任务的性质(分析网页)、输入(之前的推理步骤、当前搜索查询、搜索到的网页)和目标(提取相关信息并整合到推理过程中)。
2. 详细指南
Guidelines:
1. Analyze the Searched Web Pages:
- Carefully review the content of each searched web page.
- Identify factual information that is relevant to the Current Search Query and can aid in the reasoning process for the original question.
2. Extract Relevant Information:
- Select the information from the Searched Web Pages that directly contributes to advancing the **Previous Reasoning Steps**.
- Ensure that the extracted information is accurate and relevant.
这部分提供了两个主要步骤的详细指南:
- 分析网页内容,找出与当前查询相关的事实信息
- 提取能够推进推理过程的相关信息,确保准确性和相关性
3. 输出格式规范
3. Output Format:
- If the web pages provide helpful information for current search query: Present the information beginning with `**Final Information**` as shown below.
- The language of query MUST BE as the same as 'Search Query' or 'Web Pages'.\n"
Final Information
[Helpful information]
- If the web pages do not provide any helpful information for current search query: Output the following text.
Final Information
No helpful information found.
这部分详细规定了输出格式:
- 有用信息的格式:以"Final Information"开头,后跟有用信息
- 无用信息的格式:固定文本"No helpful information found."
- 强调语言一致性要求
4. 输入参数占位符
Inputs:
- Previous Reasoning Steps: {prev_reasoning}
- Current Search Query: {search_query}
- Searched Web Pages: {document}
这部分定义了三个关键输入参数的占位符:
{prev_reasoning}
:之前的推理步骤,提供上下文{search_query}
:当前的搜索查询,指明信息提取的焦点{document}
:搜索到的网页内容,是信息提取的源
两个提示词的协同工作
这两个提示词在 DeepResearcher 的工作流程中协同工作:
REASON_PROMPT
指导语言模型生成推理步骤和搜索查询,形成思考过程的骨架RELEVANT_EXTRACTION_PROMPT
指导语言模型从检索到的信息中提取相关内容,填充思考过程的细节
通过这种分工,系统能够实现:
- 清晰的推理链条
- 精准的信息检索
- 相关信息的有效提取
- 连贯的思考过程
这两个提示词的精心设计是 DeepResearcher 能够模拟人类思考过程、解决复杂问题的关键所在。