基于LightRAG的知识图谱搭建(自主搭建版--提示词介绍)
本文介绍了如何利用LightRAG框架中的提示词进行知识图谱构建和文本检索。主要内容包括:1)实体关系提取的提示词设计,包含分隔符定义、系统提示模板和用户提示模板;2)多阶段提取流程,包括初次提取和二次纠错补充;3)实体关系描述的总结优化提示词;4)知识图谱问答的回复模板,支持严格基于知识库或灵活补充常识的回复方式。这些提示词可用于构建轻量化的知识图谱应用,无需依赖官方项目的完整功能。
相信大家在各种RAG方法的使用中也发现了图结构拥有其特有的强大能力,就算是非常简单的关系网络也能在简单的文本检索的基础上提供很多的关键信息。在这里,我就以LightRAG为例,通过将它代码中的提示词单独抽出来做一个自己的轻量化文本检索项目,而不需要使用官方项目中整合的LLM回复。
(1.Entity Extraction(实体提取),模型需要识别:
(2.Relationship Extraction(关系提取)
(2)对问题的提取(把自然语言问题转为结构化的“高层关键词 + 低层关键词”,帮助后续文档检索更精准)
(3)简单的RAG回复(不使用知识图谱,仅根据文档内容回复)
(4)更加灵活“宽松”的智能体回复(不限制一定得从已有的知识库中回答,可以进行一定的推理):
1. 初次提取阶段的提示词
(1)定义分隔符:
PROMPTS["DEFAULT_TUPLE_DELIMITER"] = "<|>"
PROMPTS["DEFAULT_RECORD_DELIMITER"] = "##"
PROMPTS["DEFAULT_COMPLETION_DELIMITER"] = "<|COMPLETE|>"
PROMPTS["DEFAULT_USER_PROMPT"] = "n/a"
前三个是自主给定的分隔符,用于在LLM输出后使用正则化匹配方便地提取关系
| 名称 | 用途 | 示例 |
|---|---|---|
DEFAULT_TUPLE_DELIMITER |
在一个实体或关系的不同字段之间的分隔符 | entity <|> relationship |
DEFAULT_RECORD_DELIMITER |
不同实体或关系之间的分隔符 | (entity...)##(entity...)##(relationship...) |
DEFAULT_COMPLETION_DELIMITER |
表示输出结束 | ~ <|complete|> |
最后一个是表示当前任务默认不需要额外的用户提示。
(2)大模型的完整提示模板:
PROMPTS["entity_extraction_system_prompt"] = """---Role---
You are a Knowledge Graph Specialist responsible for extracting entities and relationships from the input text.
---Instructions---
1. **Entity Extraction:** Identify clearly defined and meaningful entities in the input text, and extract the following information:
- entity_name: Name of the entity, ensure entity names are consistent throughout the extraction.
- entity_type: Categorize the entity using the following entity types: {entity_types}; if none of the provided types are suitable, classify it as `Other`.
- entity_description: Provide a comprehensive description of the entity's attributes and activities based on the information present in the input text.
2. **Entity Output Format:** (entity{tuple_delimiter}entity_name{tuple_delimiter}entity_type{tuple_delimiter}entity_description)
3. **Relationship Extraction:** Identify direct, clearly-stated and meaningful relationships between extracted entities within the input text, and extract the following information:
- source_entity: name of the source entity.
- target_entity: name of the target entity.
- relationship_keywords: one or more high-level key words that summarize the overarching nature of the relationship, focusing on concepts or themes rather than specific details.
- relationship_description: Explain the nature of the relationship between the source and target entities, providing a clear rationale for their connection.
4. **Relationship Output Format:** (relationship{tuple_delimiter}source_entity{tuple_delimiter}target_entity{tuple_delimiter}relationship_keywords{tuple_delimiter}relationship_description)
5. **Relationship Order:** Prioritize relationships based on their significance to the intended meaning of input text, and output more crucial relationships first.
6. **Avoid Pronouns:** For entity names and all descriptions, explicitly name the subject or object instead of using pronouns; avoid pronouns such as `this document`, `our company`, `I`, `you`, and `he/she`.
7. **Undirectional Relationship:** Treat relationships as undirected; swapping the source and target entities does not constitute a new relationship. Avoid outputting duplicate relationships.
8. **Language:** Output entity names, keywords and descriptions in {language}.
9. **Delimiter:** Use `{record_delimiter}` as the entity or relationship list delimiter; output `{completion_delimiter}` when all the entities and relationships are extracted.
---Examples---
{examples}
---Real Data to be Processed---
<Input>
Entity_types: [{entity_types}]
Text:
```
{input_text}
```
"""
这段文字就是给大模型的完整提示模板,包括角色、步骤、格式、语言、输出符号和样例。
我们把它分解为几个部分:
1) Role(角色设定),告诉模型它的身份:
---Role---
You are a Knowledge Graph Specialist ...
“你是一名知识图谱专家,负责从文本中提取实体和关系。”
2)Instructions(任务规则):
(1.Entity Extraction(实体提取),模型需要识别:
-
entity_name(实体名) -
entity_type(实体类型) -
entity_description(描述)
并且用模板输出:entity<|>entity_name<|>entity_type<|>entity_description
例如:entity<|>Tesla<|>Company<|>An American electric vehicle manufacturer.
(2.Relationship Extraction(关系提取)
模型识别:
-
source_entity -
target_entity -
relationship_keywords -
relationship_description
并按格式输出:relationship<|>source_entity<|>target_entity<|>relationship_keywords<|>relationship_description 例如:relationship<|>Tesla<|>Elon Musk<|>founded_by<|>Elon Musk founded Tesla.
(3.使用方法:
prompt = PROMPTS["entity_extraction_system_prompt"].format(
entity_types="Person, Organization, Location, Product",
tuple_delimiter=PROMPTS["DEFAULT_TUPLE_DELIMITER"],
record_delimiter=PROMPTS["DEFAULT_RECORD_DELIMITER"],
completion_delimiter=PROMPTS["DEFAULT_COMPLETION_DELIMITER"],
language="English",
examples=example_text,
input_text=real_text
)
"""
构造提示词例如:
---Role---
You are a Knowledge Graph Specialist responsible for extracting entities and relationships from the input text.
---Instructions---
1. **Entity Extraction:** Identify ...
...
9. **Delimiter:** Use `##` as the entity or relationship list delimiter; output `<|COMPLETE|>` when all the entities and relationships are extracted.
---Examples---
(entity<|>Tesla<|>Company<|>An American EV company)##
(relationship<|>Elon Musk<|>Tesla<|>founded_by<|>Elon Musk founded Tesla.)##
<|COMPLETE|>
---Real Data to be Processed---
<Input>
Entity_types: [Person, Company, Product]
Text:
"""
3)对应的中文提示词:
PROMPTS["entity_extraction_system_prompt"] = """---角色说明---
你是一名知识图谱专家,负责从输入文本中提取实体(Entity)和关系(Relationship)。
---任务指令---
1. **实体抽取(Entity Extraction):**
识别输入文本中定义明确且具有实际意义的实体,并提取以下信息:
- entity_name:实体名称,确保在整个抽取过程中实体名称保持一致。
- entity_type:为实体分类。可选类型包括:{entity_types};若无合适类型,请标记为 `Other`。
- entity_description:根据输入文本中提供的信息,对实体的属性和活动进行全面描述。
2. **实体输出格式:**
(entity{tuple_delimiter}entity_name{tuple_delimiter}entity_type{tuple_delimiter}entity_description)
3. **关系抽取(Relationship Extraction):**
识别输入文本中实体之间**直接、明确且有意义**的关系,并提取以下信息:
- source_entity:关系的起始实体名称。
- target_entity:关系的目标实体名称。
- relationship_keywords:用于概括该关系性质的一个或多个高层次关键词,应侧重于整体概念或主题,而非具体细节。
- relationship_description:说明源实体与目标实体之间关系的性质,并提供其逻辑或语义上的联系依据。
4. **关系输出格式:**
(relationship{tuple_delimiter}source_entity{tuple_delimiter}target_entity{tuple_delimiter}relationship_keywords{tuple_delimiter}relationship_description)
5. **关系输出顺序:**
按照关系对文本语义的重要性排序,优先输出更关键的关系。
6. **避免代词使用:**
在实体名称和所有描述中,应明确指明主语或宾语,不得使用代词,如 `本文档`、`我们公司`、`我`、`你`、`他/她` 等。
7. **无方向性关系(Undirectional Relationship):**
将关系视为无方向性的;交换源实体与目标实体不应视为新关系,应避免输出重复关系。
8. **语言要求:**
所有实体名称、关键词与描述均应使用 {language} 输出。
9. **分隔符规则:**
使用 `{record_delimiter}` 作为实体与关系列表的分隔符;当所有实体与关系均抽取完成后,输出 `{completion_delimiter}`。
---示例---
{examples}
---待处理的实际数据---
<Input>
实体类型(Entity_types):[{entity_types}]
文本内容:
"""
(3)用户提示词:
PROMPTS["entity_extraction_user_prompt"] = """---Task---
Extract entities and relationships from the input text to be Processed.
---Instructions---
1. Output entities and relationships, prioritized by their relevance to the input text's core meaning.
2. Output `{completion_delimiter}` when all the entities and relationships are extracted.
3. Ensure the output language is {language}.
<Output>
"""
1)使用方法:
user_prompt = PROMPTS["entity_extraction_user_prompt"].format(
completion_delimiter=PROMPTS["DEFAULT_COMPLETION_DELIMITER"],
language="English"
)
"""
构造提示词例如:
---Task---
Extract entities and relationships from the input text to be Processed.
---Instructions---
1. Output entities and relationships, prioritized by their relevance to the input text's core meaning.
2. Output `<|COMPLETE|>` when all the entities and relationships are extracted.
3. Ensure the output language is English.
<Output>
"""
和SYSTEM提示词的配合使用:
system_prompt = PROMPTS["entity_extraction_system_prompt"].format(
entity_types="Person, Company, Product",
tuple_delimiter="<|>",
record_delimiter="##",
completion_delimiter="<|COMPLETE|>",
language="English",
examples="(entity<|>Elon Musk<|>Person<|>Entrepreneur)##(entity<|>Tesla<|>Company<|>EV maker)##(relationship<|>Elon Musk<|>Tesla<|>founded_by<|>Elon Musk founded Tesla)##<|COMPLETE|>",
input_text="Elon Musk founded Tesla in 2003."
)
user_prompt = PROMPTS["entity_extraction_user_prompt"].format(
completion_delimiter="<|COMPLETE|>",
language="English"
)
response = model.chat([
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
])
"""
模型输出的期望:
(entity<|>Elon Musk<|>Person<|>Entrepreneur, CEO of Tesla)##
(entity<|>Tesla<|>Company<|>American EV manufacturer)##
(relationship<|>Elon Musk<|>Tesla<|>founded_by<|>Elon Musk founded Tesla)##
<|COMPLETE|>
"""
2)对应的中文提示词:
PROMPTS["entity_extraction_user_prompt"] = """---任务说明---
从待处理的输入文本中提取实体(Entities)和关系(Relationships)。
---执行指令---
1. 输出所有实体与关系,并按照它们对文本核心含义的重要程度进行排序。
2. 当所有实体与关系均提取完毕后,输出 `{completion_delimiter}`。
3. 确保输出语言为 {language}。
<输出>
"""
(4)二次纠错补充提示词:
PROMPTS["entity_continue_extraction_user_prompt"] = """---Task---
Identify any missed entities or relationships from the input text to be Processed of last extraction task.
---Instructions---
1. Output the entities and realtionships in the same format as previous extraction task.
2. Do not include entities and relations that have been correctly extracted in last extraction task.
3. If the entity or relation output is truncated or has missing fields in last extraction task, please re-output it in the correct format.
4. Output `{completion_delimiter}` when all the entities and relationships are extracted.
5. Ensure the output language is {language}.
<Output>
"""
它的作用是:让模型在上一次实体关系抽取任务之后,进行补充或纠错提取。换句话说,它是第二阶段的提示词(continuation prompt),用于 “检查并补全上次遗漏的实体或关系”。
1)使用方法:
# 第一次抽取
system_prompt = PROMPTS["entity_extraction_system_prompt"].format(
entity_types="Person, Company, Product",
tuple_delimiter="<|>",
record_delimiter="##",
completion_delimiter="<|COMPLETE|>",
language="English",
examples="...",
input_text="Elon Musk founded Tesla in 2003."
)
user_prompt_1 = PROMPTS["entity_extraction_user_prompt"].format(
completion_delimiter="<|COMPLETE|>",
language="English"
)
first_output = model.chat([
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt_1}
])
# 第二次补抽
user_prompt_2 = PROMPTS["entity_continue_extraction_user_prompt"].format(
completion_delimiter="<|COMPLETE|>",
language="English"
)
continue_output = model.chat([
{"role": "system", "content": system_prompt}, # 继续使用同样的规则
{"role": "user", "content": f"""
Below is the previous extraction result:
{first_output}
Please continue extraction on the same text to find missed entities or relationships.
{user_prompt_2}
"""}
])
该提示词用于在第一次提取实体关系结束后,进行二次或多次的纠错和整改。
2)对应的中文提示词:
PROMPTS["entity_continue_extraction_user_prompt"] = """---任务---
识别在上一次抽取任务中遗漏的实体或关系。
---指令---
1. 按照上一次抽取任务的相同格式输出实体和关系。
2. 不要包含上一次任务中已被正确抽取的实体和关系。
3. 如果上一次任务中某个实体或关系被截断、格式错误或字段缺失,请重新以正确格式输出。
4. 当所有实体和关系都已抽取完毕时,输出 `{completion_delimiter}`。
5. 确保输出语言为 {language}。
<输出>
"""
(5)模型实体与关系抽取任务的示例:
PROMPTS["entity_extraction_examples"] = [
"""<Input Text>
```
while Alex clenched his jaw, the buzz of frustration dull against the backdrop of Taylor's authoritarian certainty. It was this competitive undercurrent that kept him alert, the sense that his and Jordan's shared commitment to discovery was an unspoken rebellion against Cruz's narrowing vision of control and order.
Then Taylor did something unexpected. They paused beside Jordan and, for a moment, observed the device with something akin to reverence. "If this tech can be understood..." Taylor said, their voice quieter, "It could change the game for us. For all of us."
The underlying dismissal earlier seemed to falter, replaced by a glimpse of reluctant respect for the gravity of what lay in their hands. Jordan looked up, and for a fleeting heartbeat, their eyes locked with Taylor's, a wordless clash of wills softening into an uneasy truce.
It was a small transformation, barely perceptible, but one that Alex noted with an inward nod. They had all been brought here by different paths
```
<Output>
(entity{tuple_delimiter}Alex{tuple_delimiter}person{tuple_delimiter}Alex is a character who experiences frustration and is observant of the dynamics among other characters.){record_delimiter}
(entity{tuple_delimiter}Taylor{tuple_delimiter}person{tuple_delimiter}Taylor is portrayed with authoritarian certainty and shows a moment of reverence towards a device, indicating a change in perspective.){record_delimiter}
(entity{tuple_delimiter}Jordan{tuple_delimiter}person{tuple_delimiter}Jordan shares a commitment to discovery and has a significant interaction with Taylor regarding a device.){record_delimiter}
(entity{tuple_delimiter}Cruz{tuple_delimiter}person{tuple_delimiter}Cruz is associated with a vision of control and order, influencing the dynamics among other characters.){record_delimiter}
(entity{tuple_delimiter}The Device{tuple_delimiter}equiment{tuple_delimiter}The Device is central to the story, with potential game-changing implications, and is revered by Taylor.){record_delimiter}
(relationship{tuple_delimiter}Alex{tuple_delimiter}Taylor{tuple_delimiter}power dynamics, observation{tuple_delimiter}Alex observes Taylor's authoritarian behavior and notes changes in Taylor's attitude toward the device.){record_delimiter}
(relationship{tuple_delimiter}Alex{tuple_delimiter}Jordan{tuple_delimiter}shared goals, rebellion{tuple_delimiter}Alex and Jordan share a commitment to discovery, which contrasts with Cruz's vision.){record_delimiter}
(relationship{tuple_delimiter}Taylor{tuple_delimiter}Jordan{tuple_delimiter}conflict resolution, mutual respect{tuple_delimiter}Taylor and Jordan interact directly regarding the device, leading to a moment of mutual respect and an uneasy truce.){record_delimiter}
(relationship{tuple_delimiter}Jordan{tuple_delimiter}Cruz{tuple_delimiter}ideological conflict, rebellion{tuple_delimiter}Jordan's commitment to discovery is in rebellion against Cruz's vision of control and order.){record_delimiter}
(relationship{tuple_delimiter}Taylor{tuple_delimiter}The Device{tuple_delimiter}reverence, technological significance{tuple_delimiter}Taylor shows reverence towards the device, indicating its importance and potential impact.){record_delimiter}
{completion_delimiter}
""",
"""<Input Text>
```
Stock markets faced a sharp downturn today as tech giants saw significant declines, with the Global Tech Index dropping by 3.4% in midday trading. Analysts attribute the selloff to investor concerns over rising interest rates and regulatory uncertainty.
Among the hardest hit, Nexon Technologies saw its stock plummet by 7.8% after reporting lower-than-expected quarterly earnings. In contrast, Omega Energy posted a modest 2.1% gain, driven by rising oil prices.
Meanwhile, commodity markets reflected a mixed sentiment. Gold futures rose by 1.5%, reaching $2,080 per ounce, as investors sought safe-haven assets. Crude oil prices continued their rally, climbing to $87.60 per barrel, supported by supply constraints and strong demand.
Financial experts are closely watching the Federal Reserve's next move, as speculation grows over potential rate hikes. The upcoming policy announcement is expected to influence investor confidence and overall market stability.
```
<Output>
(entity{tuple_delimiter}Global Tech Index{tuple_delimiter}category{tuple_delimiter}The Global Tech Index tracks the performance of major technology stocks and experienced a 3.4% decline today.){record_delimiter}
(entity{tuple_delimiter}Nexon Technologies{tuple_delimiter}organization{tuple_delimiter}Nexon Technologies is a tech company that saw its stock decline by 7.8% after disappointing earnings.){record_delimiter}
(entity{tuple_delimiter}Omega Energy{tuple_delimiter}organization{tuple_delimiter}Omega Energy is an energy company that gained 2.1% in stock value due to rising oil prices.){record_delimiter}
(entity{tuple_delimiter}Gold Futures{tuple_delimiter}product{tuple_delimiter}Gold futures rose by 1.5%, indicating increased investor interest in safe-haven assets.){record_delimiter}
(entity{tuple_delimiter}Crude Oil{tuple_delimiter}product{tuple_delimiter}Crude oil prices rose to $87.60 per barrel due to supply constraints and strong demand.){record_delimiter}
(entity{tuple_delimiter}Market Selloff{tuple_delimiter}category{tuple_delimiter}Market selloff refers to the significant decline in stock values due to investor concerns over interest rates and regulations.){record_delimiter}
(entity{tuple_delimiter}Federal Reserve Policy Announcement{tuple_delimiter}category{tuple_delimiter}The Federal Reserve's upcoming policy announcement is expected to impact investor confidence and market stability.){record_delimiter}
(entity{tuple_delimiter}3.4% Decline{tuple_delimiter}category{tuple_delimiter}The Global Tech Index experienced a 3.4% decline in midday trading.){record_delimiter}
(relationship{tuple_delimiter}Global Tech Index{tuple_delimiter}Market Selloff{tuple_delimiter}market performance, investor sentiment{tuple_delimiter}The decline in the Global Tech Index is part of the broader market selloff driven by investor concerns.){record_delimiter}
(relationship{tuple_delimiter}Nexon Technologies{tuple_delimiter}Global Tech Index{tuple_delimiter}company impact, index movement{tuple_delimiter}Nexon Technologies' stock decline contributed to the overall drop in the Global Tech Index.){record_delimiter}
(relationship{tuple_delimiter}Gold Futures{tuple_delimiter}Market Selloff{tuple_delimiter}market reaction, safe-haven investment{tuple_delimiter}Gold prices rose as investors sought safe-haven assets during the market selloff.){record_delimiter}
(relationship{tuple_delimiter}Federal Reserve Policy Announcement{tuple_delimiter}Market Selloff{tuple_delimiter}interest rate impact, financial regulation{tuple_delimiter}Speculation over Federal Reserve policy changes contributed to market volatility and investor selloff.){record_delimiter}
{completion_delimiter}
""",
"""<Input Text>
```
At the World Athletics Championship in Tokyo, Noah Carter broke the 100m sprint record using cutting-edge carbon-fiber spikes.
```
<Output>
(entity{tuple_delimiter}World Athletics Championship{tuple_delimiter}event{tuple_delimiter}The World Athletics Championship is a global sports competition featuring top athletes in track and field.){record_delimiter}
(entity{tuple_delimiter}Tokyo{tuple_delimiter}location{tuple_delimiter}Tokyo is the host city of the World Athletics Championship.){record_delimiter}
(entity{tuple_delimiter}Noah Carter{tuple_delimiter}person{tuple_delimiter}Noah Carter is a sprinter who set a new record in the 100m sprint at the World Athletics Championship.){record_delimiter}
(entity{tuple_delimiter}100m Sprint Record{tuple_delimiter}category{tuple_delimiter}The 100m sprint record is a benchmark in athletics, recently broken by Noah Carter.){record_delimiter}
(entity{tuple_delimiter}Carbon-Fiber Spikes{tuple_delimiter}equipment{tuple_delimiter}Carbon-fiber spikes are advanced sprinting shoes that provide enhanced speed and traction.){record_delimiter}
(entity{tuple_delimiter}World Athletics Federation{tuple_delimiter}organization{tuple_delimiter}The World Athletics Federation is the governing body overseeing the World Athletics Championship and record validations.){record_delimiter}
(relationship{tuple_delimiter}World Athletics Championship{tuple_delimiter}Tokyo{tuple_delimiter}event location, international competition{tuple_delimiter}The World Athletics Championship is being hosted in Tokyo.){record_delimiter}
(relationship{tuple_delimiter}Noah Carter{tuple_delimiter}100m Sprint Record{tuple_delimiter}athlete achievement, record-breaking{tuple_delimiter}Noah Carter set a new 100m sprint record at the championship.){record_delimiter}
(relationship{tuple_delimiter}Noah Carter{tuple_delimiter}Carbon-Fiber Spikes{tuple_delimiter}athletic equipment, performance boost{tuple_delimiter}Noah Carter used carbon-fiber spikes to enhance performance during the race.){record_delimiter}
(relationship{tuple_delimiter}Noah Carter{tuple_delimiter}World Athletics Championship{tuple_delimiter}athlete participation, competition{tuple_delimiter}Noah Carter is competing at the World Athletics Championship.){record_delimiter}
{completion_delimiter}
""",
]
这是一段模型抽取实体和关系任务的示例,可以通过添加到模型上下文中来增强模型效果,或者给我们一个参照。
1)对应的中文提示词:
PROMPTS["entity_extraction_examples"] = [
"""<输入文本>
当亚历克斯(Alex)咬紧牙关时,挫败的嗡鸣在泰勒(Taylor)那种专断的自信背景下变得迟钝。正是这种暗流的竞争让他保持警觉,那种他与乔丹(Jordan)对探索共同执着的感觉,是对克鲁兹(Cruz)那种控制与秩序的狭隘愿景的无声反抗。
接着,泰勒做了一件出乎意料的事。他们停在乔丹身旁,片刻间以一种近乎敬畏的目光看着那台装置。“如果这项技术能被理解……”泰勒的声音低了下来,“那将改变我们的局势,对我们所有人都是如此。”
先前那种轻蔑的态度似乎动摇了,取而代之的是对手中事物分量的勉强尊重。乔丹抬起头,在那一瞬间,他们的目光与泰勒的交汇,无声的意志冲突软化为一场不安的休战。
那是一次微小的转变,几乎难以察觉,但亚历克斯在心中默默点头。他们都通过不同的路径被带到了这里。
python
复制代码
<输出>
(entity{tuple_delimiter}Alex{tuple_delimiter}人物{tuple_delimiter}Alex 是一个感到挫败并敏锐观察其他角色动态的人物。){record_delimiter}
(entity{tuple_delimiter}Taylor{tuple_delimiter}人物{tuple_delimiter}Taylor 以专断的自信出现,并对设备表现出敬意,展现出其观念的转变。){record_delimiter}
(entity{tuple_delimiter}Jordan{tuple_delimiter}人物{tuple_delimiter}Jordan 对探索有共同执着,并在装置问题上与 Taylor 产生重要互动。){record_delimiter}
(entity{tuple_delimiter}Cruz{tuple_delimiter}人物{tuple_delimiter}Cruz 代表着控制与秩序的理念,影响了其他角色间的关系动态。){record_delimiter}
(entity{tuple_delimiter}设备{tuple_delimiter}物品{tuple_delimiter}设备是故事的核心,具有潜在的重大影响,Taylor 对其表现出敬意。){record_delimiter}
(relationship{tuple_delimiter}Alex{tuple_delimiter}Taylor{tuple_delimiter}权力关系, 观察{tuple_delimiter}Alex 观察到 Taylor 的专断行为,并注意到其对设备态度的变化。){record_delimiter}
(relationship{tuple_delimiter}Alex{tuple_delimiter}Jordan{tuple_delimiter}共同目标, 反叛{tuple_delimiter}Alex 与 Jordan 在探索上有共同执着,这与 Cruz 的愿景形成对比。){record_delimiter}
(relationship{tuple_delimiter}Taylor{tuple_delimiter}Jordan{tuple_delimiter}冲突缓和, 相互尊重{tuple_delimiter}Taylor 与 Jordan 在设备问题上直接互动,最终形成短暂的相互尊重与休战。){record_delimiter}
(relationship{tuple_delimiter}Jordan{tuple_delimiter}Cruz{tuple_delimiter}意识形态冲突, 反叛{tuple_delimiter}Jordan 的探索精神是对 Cruz 控制理念的反抗。){record_delimiter}
(relationship{tuple_delimiter}Taylor{tuple_delimiter}设备{tuple_delimiter}敬意, 技术重要性{tuple_delimiter}Taylor 对设备表现出敬意,暗示其潜在的影响力与重要性。){record_delimiter}
{completion_delimiter}
""",
"""<输入文本>
今日股市大幅下跌,科技巨头遭遇显著回调,全球科技指数(Global Tech Index)在午间交易时段下跌了 3.4%。分析师将抛售归因于投资者对利率上升与监管不确定性的担忧。
在跌幅最大的公司中,Nexon Technologies 公布的季度业绩不及预期,股价暴跌 7.8%。相比之下,Omega Energy 因油价上涨而小幅上涨 2.1%。
与此同时,大宗商品市场表现分化。黄金期货上涨 1.5%,达到每盎司 2080 美元,投资者转向避险资产。原油价格继续上行,升至每桶 87.60 美元,受供应紧张与需求强劲支撑。
金融专家正密切关注美联储(Federal Reserve)的下一步行动,市场对加息预期升温。即将发布的政策声明预计将影响投资者信心与市场稳定性。
scss
复制代码
<输出>
(entity{tuple_delimiter}全球科技指数{tuple_delimiter}指数类别{tuple_delimiter}全球科技指数追踪主要科技股的表现,今日下跌 3.4%。){record_delimiter}
(entity{tuple_delimiter}Nexon Technologies{tuple_delimiter}组织{tuple_delimiter}Nexon Technologies 是一家科技公司,其股价因业绩不佳下跌 7.8%。){record_delimiter}
(entity{tuple_delimiter}Omega Energy{tuple_delimiter}组织{tuple_delimiter}Omega Energy 是一家能源公司,因油价上涨而股价上涨 2.1%。){record_delimiter}
(entity{tuple_delimiter}黄金期货{tuple_delimiter}产品{tuple_delimiter}黄金期货上涨 1.5%,反映出投资者对避险资产的兴趣增加。){record_delimiter}
(entity{tuple_delimiter}原油{tuple_delimiter}产品{tuple_delimiter}原油价格升至每桶 87.60 美元,受供应紧张与强劲需求推动。){record_delimiter}
(entity{tuple_delimiter}市场抛售{tuple_delimiter}市场类别{tuple_delimiter}市场抛售指投资者因利率与监管担忧而引发的股票大幅下跌。){record_delimiter}
(entity{tuple_delimiter}美联储政策声明{tuple_delimiter}政策类别{tuple_delimiter}美联储即将发布的政策声明预计将影响投资者信心与市场稳定。){record_delimiter}
(entity{tuple_delimiter}3.4% 下跌{tuple_delimiter}市场数据{tuple_delimiter}全球科技指数在午间交易中下跌 3.4%。){record_delimiter}
(relationship{tuple_delimiter}全球科技指数{tuple_delimiter}市场抛售{tuple_delimiter}市场表现, 投资者情绪{tuple_delimiter}全球科技指数的下跌属于更广泛的市场抛售,反映出投资者的担忧情绪。){record_delimiter}
(relationship{tuple_delimiter}Nexon Technologies{tuple_delimiter}全球科技指数{tuple_delimiter}公司影响, 指数波动{tuple_delimiter}Nexon Technologies 的股价下跌对全球科技指数的下跌有贡献。){record_delimiter}
(relationship{tuple_delimiter}黄金期货{tuple_delimiter}市场抛售{tuple_delimiter}市场反应, 避险投资{tuple_delimiter}黄金价格在市场抛售期间上涨,显示投资者转向避险资产。){record_delimiter}
(relationship{tuple_delimiter}美联储政策声明{tuple_delimiter}市场抛售{tuple_delimiter}利率影响, 金融监管{tuple_delimiter}对美联储政策变化的猜测导致市场波动与投资者抛售行为。){record_delimiter}
{completion_delimiter}
""",
"""<输入文本>
在东京举行的世界田径锦标赛上,Noah Carter 使用尖端的碳纤维钉鞋打破了 100 米短跑纪录。
scss
复制代码
<输出>
(entity{tuple_delimiter}世界田径锦标赛{tuple_delimiter}赛事{tuple_delimiter}世界田径锦标赛是一项全球性的田径赛事,汇聚顶级运动员。){record_delimiter}
(entity{tuple_delimiter}东京{tuple_delimiter}地点{tuple_delimiter}东京是世界田径锦标赛的举办城市。){record_delimiter}
(entity{tuple_delimiter}Noah Carter{tuple_delimiter}人物{tuple_delimiter}Noah Carter 是一名短跑运动员,在世界田径锦标赛上打破了 100 米短跑纪录。){record_delimiter}
(entity{tuple_delimiter}100 米短跑纪录{tuple_delimiter}记录类别{tuple_delimiter}100 米短跑纪录是田径的重要基准,最近被 Noah Carter 打破。){record_delimiter}
(entity{tuple_delimiter}碳纤维钉鞋{tuple_delimiter}装备{tuple_delimiter}碳纤维钉鞋是一种高性能运动装备,可提升速度与抓地力。){record_delimiter}
(entity{tuple_delimiter}世界田径联合会{tuple_delimiter}组织{tuple_delimiter}世界田径联合会是负责监督世界田径锦标赛及记录认证的管理机构。){record_delimiter}
(relationship{tuple_delimiter}世界田径锦标赛{tuple_delimiter}东京{tuple_delimiter}赛事地点, 国际比赛{tuple_delimiter}世界田径锦标赛在东京举行。){record_delimiter}
(relationship{tuple_delimiter}Noah Carter{tuple_delimiter}100 米短跑纪录{tuple_delimiter}运动成就, 打破纪录{tuple_delimiter}Noah Carter 在比赛中创造了新的 100 米短跑纪录。){record_delimiter}
(relationship{tuple_delimiter}Noah Carter{tuple_delimiter}碳纤维钉鞋{tuple_delimiter}运动装备, 性能提升{tuple_delimiter}Noah Carter 使用碳纤维钉鞋提升了比赛表现。){record_delimiter}
(relationship{tuple_delimiter}Noah Carter{tuple_delimiter}世界田径锦标赛{tuple_delimiter}运动员参赛, 比赛参与{tuple_delimiter}Noah Carter 是世界田径锦标赛的参赛选手。){record_delimiter}
{completion_delimiter}
""",
]
2. 知识图谱提取的第二阶段提示词
(1)对关系实体的总结:
这一段提示词定义的任务是:输入:某个实体(或关系)的多条原始描述。输出:一条融合所有描述的高质量总结。它的目标是让模型像一个“知识图谱整理专家(Knowledge Graph Specialist)”那样,综合多条不同来源的描述,生成干净、标准、统一的总结文本。
PROMPTS["summarize_entity_descriptions"] = """---Role---
You are a Knowledge Graph Specialist responsible for data curation and synthesis.
---Task---
Your task is to synthesize a list of descriptions of a given entity or relation into a single, comprehensive, and cohesive summary.
---Instructions---
1. **Comprehensiveness:** The summary must integrate key information from all provided descriptions. Do not omit important facts.
2. **Context:** The summary must explicitly mention the name of the entity or relation for full context.
3. **Conflict:** In case of conflicting or inconsistent descriptions, determine if they originate from multiple, distinct entities or relationships that share the same name. If so, summarize each entity or relationship separately and then consolidate all summaries.
4. **Style:** The output must be written from an objective, third-person perspective.
5. **Length:** Maintain depth and completeness while ensuring the summary's length not exceed {summary_length} tokens.
6. **Language:** The entire output must be written in {language}.
---Data---
{description_type} Name: {description_name}
Description List:
{description_list}
---Output---
"""
1)使用方法
PROMPTS["summarize_entity_descriptions"].format(
summary_length=200,
language="English",
description_type="Entity",
description_name="Apple Inc.",
description_list="\n".join(descriptions)
)
"""
提示词示例:
---Role---
You are a Knowledge Graph Specialist responsible for data curation and synthesis.
---Task---
Your task is to synthesize a list of descriptions of a given entity or relation into a single, comprehensive, and cohesive summary.
---Instructions---
1. **Comprehensiveness:** The summary must integrate key information from all provided descriptions. Do not omit important facts.
2. **Context:** The summary must explicitly mention the name of the entity or relation for full context.
3. **Conflict:** In case of conflicting or inconsistent descriptions, determine if they originate from multiple, distinct entities or relationships that share the same name. If so, summarize each entity or relationship separately and then consolidate all summaries.
4. **Style:** The output must be written from an objective, third-person perspective.
5. **Length:** Maintain depth and completeness while ensuring the summary's length not exceed 200 tokens.
6. **Language:** The entire output must be written in English.
---Data---
Entity Name: Tesla, Inc.
Description List:
1. Tesla, Inc. is an American electric vehicle manufacturer founded by Elon Musk in 2003.
2. Tesla, Inc. also produces energy storage systems and solar panels.
3. Tesla was founded by Martin Eberhard and Marc Tarpenning before Elon Musk joined.
4. Tesla focuses on advancing sustainable energy and autonomous driving technology.
5. Tesla’s main products include Model S, Model 3, Model X, and Model Y.
---Output---
"""
"""
期望的模型输出:
Tesla, Inc. is an American company specializing in electric vehicles, clean energy solutions, and sustainable technology. It was originally founded in 2003 by Martin Eberhard and Marc Tarpenning, with Elon Musk later joining as an early investor and key executive. Tesla develops and manufactures electric cars such as the Model S, Model 3, Model X, and Model Y, as well as energy storage products and solar panels. The company is recognized for its leadership in autonomous driving systems and its mission to accelerate the world’s transition to sustainable energy.
"""
2)对应的中文提示词:
PROMPTS["summarize_entity_descriptions"] = """---角色(Role)---
你是一名知识图谱专家,负责数据的整理与综合。
---任务(Task)---
你的任务是将针对某一特定实体或关系的多条描述,综合为一条完整、连贯且全面的摘要。
---指令(Instructions)---
1. **全面性(Comprehensiveness):** 摘要必须整合所有提供的描述中的关键信息,不得遗漏重要事实。
2. **上下文(Context):** 摘要中必须明确提及该实体或关系的名称,以确保语义完整。
3. **冲突处理(Conflict):** 如果存在冲突或不一致的描述,应判断这些描述是否来源于名称相同但实际不同的实体或关系。若是如此,请分别总结每个实体或关系的内容,然后再综合所有摘要。
4. **风格(Style):** 输出内容必须以客观的第三人称视角撰写。
5. **长度(Length):** 在保持深度与完整性的前提下,确保摘要长度不超过 {summary_length} 个 token。
6. **语言(Language):** 整个输出内容必须使用 {language} 编写。
---数据(Data)---
{description_type} 名称:{description_name}
描述列表:
{description_list}
---输出(Output)---
"""
3. 模型回复阶段提示词
(1)失败回复模板和正常回复模板:
PROMPTS["fail_response"] = (
"Sorry, I'm not able to provide an answer to that question.[no-context]"
)
PROMPTS["rag_response"] = """---Role---
You are a helpful assistant responding to user query about Knowledge Graph and Document Chunks provided in JSON format below.
---Goal---
Generate a concise response based on Knowledge Base and follow Response Rules, considering both current query and the conversation history if provided. Summarize all information in the provided Knowledge Base, and incorporating general knowledge relevant to the Knowledge Base. Do not include information not provided by Knowledge Base.
---Conversation History---
{history}
---Knowledge Graph and Document Chunks---
{context_data}
---Response Guidelines---
**1. Content & Adherence:**
- Strictly adhere to the provided context from the Knowledge Base. Do not invent, assume, or include any information not present in the source data.
- If the answer cannot be found in the provided context, state that you do not have enough information to answer.
- Ensure the response maintains continuity with the conversation history.
**2. Formatting & Language:**
- Format the response using markdown with appropriate section headings.
- The response language must in the same language as the user's question.
- Target format and length: {response_type}
**3. Citations / References:**
- At the end of the response, under a "References" section, each citation must clearly indicate its origin (KG or DC).
- The maximum number of citations is 5, including both KG and DC.
- Use the following formats for citations:
- For a Knowledge Graph Entity: `[KG] <entity_name>`
- For a Knowledge Graph Relationship: `[KG] <entity1_name> - <entity2_name>`
- For a Document Chunk: `[DC] <file_path_or_document_name>`
---USER CONTEXT---
- Additional user prompt: {user_prompt}
---Response---
"""
1)使用方法:
# === 模拟输入数据 ===
user_prompt = "爱因斯坦的相对论主要讲什么?"
history = "用户之前询问了关于牛顿力学和相对论的区别。"
# 模拟从知识库检索出的内容
context_data = """
{
"KnowledgeGraph": [
{"entity": "阿尔伯特·爱因斯坦", "relation": "提出", "object": "相对论"},
{"entity": "相对论", "relation": "包括", "object": "狭义相对论和广义相对论"}
],
"DocumentChunks": [
{
"source": "physics_notes.txt",
"content": "狭义相对论主要研究在接近光速情况下物体的运动规律,
提出时间膨胀和长度收缩的概念;广义相对论则扩展到引力领域,
将引力解释为空间的弯曲。"
}
]
}
"""
# === 构造最终提示 ===
prompt = PROMPTS["rag_response"].format(
history=history,
context_data=context_data,
response_type="concise",
user_prompt=user_prompt
)
"""
预期结果:
## 相对论的主要内容
相对论由阿尔伯特·爱因斯坦提出,分为狭义相对论和广义相对论两部分:
- **狭义相对论**:研究高速运动(接近光速)下的物理规律,提出了时间膨胀和长度收缩等概念。
- **广义相对论**:扩展了狭义相对论的原理,将引力解释为空间的弯曲,从而重新定义了引力的本质。
这两者共同构建了现代物理学的时空框架。
---
### References
- [KG] 阿尔伯特·爱因斯坦
- [KG] 相对论
- [DC] physics_notes.txt
"""
2)对应的中文提示词:
PROMPTS["fail_response"] = (
"抱歉,我无法为该问题提供答案。[no-context]"
)
PROMPTS["rag_response"] = """---角色---
你是一名知识图谱专家助手,负责根据下方提供的 JSON 格式数据(包含知识图谱与文档片段)回答用户的问题。
---目标---
基于提供的知识库生成简明的回答,并遵循以下回答规则。回答应综合知识库中的所有信息,并结合与知识库相关的一般知识。不得包含知识库中未提供的信息。
---对话历史---
{history}
---知识图谱与文档片段---
{context_data}
---回答准则---
**1. 内容与依从性:**
- 严格依据知识库中提供的内容作答。不得编造、假设或引入知识库中不存在的信息。
- 若在提供的上下文中找不到答案,应明确说明信息不足,无法回答。
- 确保回答与对话历史保持连贯。
**2. 格式与语言:**
- 使用 Markdown 格式撰写回答,并添加适当的章节标题。
- 回答语言必须与用户问题的语言一致。
- 目标格式与长度要求:{response_type}
**3. 引用与参考:**
- 在回答结尾添加“参考资料”部分,每条引用需明确其来源(KG 或 DC)。
- 最多包含 5 条引用,包括知识图谱与文档片段两种来源。
- 引用格式如下:
- 知识图谱实体引用:[KG] <实体名称>
- 知识图谱关系引用:[KG] <实体1名称> - <实体2名称>
- 文档片段引用:[DC] <文件路径或文档名称>
---用户上下文---
- 用户附加提示:{user_prompt}
---回答---
"""
(2)对问题的提取(把自然语言问题转为结构化的“高层关键词 + 低层关键词”,帮助后续文档检索更精准)
PROMPTS["keywords_extraction"] = """---Role---
You are an expert keyword extractor, specializing in analyzing user queries for a Retrieval-Augmented Generation (RAG) system. Your purpose is to identify both high-level and low-level keywords in the user's query that will be used for effective document retrieval.
---Goal---
Given a user query, your task is to extract two distinct types of keywords:
1. **high_level_keywords**: for overarching concepts or themes, capturing user's core intent, the subject area, or the type of question being asked.
2. **low_level_keywords**: for specific entities or details, identifying the specific entities, proper nouns, technical jargon, product names, or concrete items.
---Instructions & Constraints---
1. **Output Format**: Your output MUST be a valid JSON object and nothing else. Do not include any explanatory text, markdown code fences (like ```json), or any other text before or after the JSON. It will be parsed directly by a JSON parser.
2. **Source of Truth**: All keywords must be explicitly derived from the user query, with both high-level and low-level keyword categories required to contain content.
3. **Concise & Meaningful**: Keywords should be concise words or meaningful phrases. Prioritize multi-word phrases when they represent a single concept. For example, from "latest financial report of Apple Inc.", you should extract "latest financial report" and "Apple Inc." rather than "latest", "financial", "report", and "Apple".
4. **Handle Edge Cases**: For queries that are too simple, vague, or nonsensical (e.g., "hello", "ok", "asdfghjkl"), you must return a JSON object with empty lists for both keyword types.
---Examples---
{examples}
---Real Data---
User Query: {query}
---Output---
Output:"""
PROMPTS["keywords_extraction_examples"] = [
"""Example 1:
Query: "How does international trade influence global economic stability?"
Output:
{
"high_level_keywords": ["International trade", "Global economic stability", "Economic impact"],
"low_level_keywords": ["Trade agreements", "Tariffs", "Currency exchange", "Imports", "Exports"]
}
""",
"""Example 2:
Query: "What are the environmental consequences of deforestation on biodiversity?"
Output:
{
"high_level_keywords": ["Environmental consequences", "Deforestation", "Biodiversity loss"],
"low_level_keywords": ["Species extinction", "Habitat destruction", "Carbon emissions", "Rainforest", "Ecosystem"]
}
""",
"""Example 3:
Query: "What is the role of education in reducing poverty?"
Output:
{
"high_level_keywords": ["Education", "Poverty reduction", "Socioeconomic development"],
"low_level_keywords": ["School access", "Literacy rates", "Job training", "Income inequality"]
}
""",
]
1) 使用方法:
# === 模拟用户输入 ===
user_query = "What are the key factors influencing the accuracy of large language models?"
# === 构造最终提示 ===
prompt = PROMPTS["keywords_extraction"].format(
examples="\n".join(PROMPTS["keywords_extraction_examples"]),
query=user_query
)
"""
输出预期结果:
{
"high_level_keywords": [
"Large language models",
"Model accuracy",
"Performance evaluation",
"AI model reliability"
],
"low_level_keywords": [
"Training data quality",
"Parameter size",
"Prompt engineering",
"Inference process",
"Bias and noise"
]
}
"""
2)对应的中文提示词:
PROMPTS["keywords_extraction"] = """---角色---
你是一名专业的关键词提取专家,专门负责为检索增强生成(RAG)系统分析用户查询。你的目标是识别用户查询中**高层级**和**低层级**的关键词,以实现高效的文档检索。
---目标---
给定一个用户查询,你需要提取两种不同类型的关键词:
1. **high_level_keywords(高层级关键词)**:用于表示总体概念或主题,反映用户的核心意图、主题领域或问题类型。
2. **low_level_keywords(低层级关键词)**:用于表示具体的实体或细节,如特定的专有名词、技术术语、产品名称或具体项目。
---说明与约束---
1. **输出格式**:你的输出必须是一个**有效的 JSON 对象**,且**仅限该对象**。不要添加任何说明文字、Markdown 代码块(例如 ```json)、或任何额外文本。输出将直接被 JSON 解析器解析。
2. **真实性来源**:所有关键词必须**直接来源于用户查询**。两类关键词都必须包含内容,不能为空。
3. **简洁且有意义**:关键词应是简短的单词或有意义的短语。当多个词语组成一个单一概念时,优先提取该短语。例如,对于“latest financial report of Apple Inc.”,应提取“latest financial report”和“Apple Inc.”,而不是“latest”、“financial”、“report”和“Apple”。
4. **边界情况处理**:如果用户查询过于简单、模糊或无意义(例如“hello”、“ok”、“asdfghjkl”),你必须返回一个两类关键词均为空列表的 JSON 对象。
---示例---
{examples}
---真实数据---
用户查询:{query}
---输出---
输出:"""
PROMPTS["keywords_extraction_examples"] = [
"""示例 1:
查询:"How does international trade influence global economic stability?"(国际贸易如何影响全球经济稳定?)
输出:
{
"high_level_keywords": ["International trade", "Global economic stability", "Economic impact"],
"low_level_keywords": ["Trade agreements", "Tariffs", "Currency exchange", "Imports", "Exports"]
}
""",
"""示例 2:
查询:"What are the environmental consequences of deforestation on biodiversity?"(森林砍伐对生物多样性的环境影响是什么?)
输出:
{
"high_level_keywords": ["Environmental consequences", "Deforestation", "Biodiversity loss"],
"low_level_keywords": ["Species extinction", "Habitat destruction", "Carbon emissions", "Rainforest", "Ecosystem"]
}
""",
"""示例 3:
查询:"What is the role of education in reducing poverty?"(教育在减贫中起到什么作用?)
输出:
{
"high_level_keywords": ["Education", "Poverty reduction", "Socioeconomic development"],
"low_level_keywords": ["School access", "Literacy rates", "Job training", "Income inequality"]
}
""",
]
(3)简单的RAG回复(不使用知识图谱,仅根据文档内容回复)
PROMPTS["naive_rag_response"] = """---Role---
You are a helpful assistant responding to user query about Document Chunks provided provided in JSON format below.
---Goal---
Generate a concise response based on Document Chunks and follow Response Rules, considering both the conversation history and the current query. Summarize all information in the provided Document Chunks, and incorporating general knowledge relevant to the Document Chunks. Do not include information not provided by Document Chunks.
---Conversation History---
{history}
---Document Chunks(DC)---
{content_data}
---RESPONSE GUIDELINES---
**1. Content & Adherence:**
- Strictly adhere to the provided context from the Knowledge Base. Do not invent, assume, or include any information not present in the source data.
- If the answer cannot be found in the provided context, state that you do not have enough information to answer.
- Ensure the response maintains continuity with the conversation history.
**2. Formatting & Language:**
- Format the response using markdown with appropriate section headings.
- The response language must match the user's question language.
- Target format and length: {response_type}
**3. Citations / References:**
- At the end of the response, under a "References" section, cite a maximum of 5 most relevant sources used.
- Use the following formats for citations: `[DC] <file_path_or_document_name>`
---USER CONTEXT---
- Additional user prompt: {user_prompt}
---Response---
Output:"""
1)使用方法
# === 模拟输入 ===
history = "User: 请告诉我Transformer的核心原理是什么?\nAssistant: Transformer依赖于自注意力机制。"
content_data = """
[
{
"document_name": "transformer_paper.txt",
"content": "The Transformer model is based on the attention mechanism, allowing the model to weigh the influence of different words dynamically. It removes the need for recurrence and convolutions, enabling parallel processing of sequences."
},
{
"document_name": "nlp_intro.pdf",
"content": "Transformers rely on self-attention layers to capture long-range dependencies in text data. This makes them more efficient for tasks such as translation, summarization, and text generation."
}
]
"""
user_prompt = "请总结Transformer模型的主要特点。"
# === 构造提示 ===
prompt = PROMPTS["naive_rag_response"].format(
history=history,
content_data=content_data,
response_type="short summary (around 150 words)",
user_prompt=user_prompt
)
"""
预期回复:
### Transformer 模型主要特点
Transformer 模型以 **自注意力机制(Self-Attention)** 为核心,通过计算序列中各个词之间的相关性,动态分配权重,从而有效捕捉长距离依赖关系。
与传统的 RNN 或 CNN 不同,Transformer 完全移除了循环和卷积结构,使得模型可以 **并行处理序列数据**,显著提升训练效率。
这种架构在机器翻译、摘要生成和文本生成等任务中表现出色。
---
**References:**
- [DC] transformer_paper.txt
- [DC] nlp_intro.pdf
"""
2)对应的中文提示词
PROMPTS["naive_rag_response"] = """---角色---
你是一名乐于助人的助手,负责根据下方提供的文档块(Document Chunks,JSON 格式)回答用户查询。
---目标---
根据文档块生成简明且有信息量的回答,并遵循响应规则,同时考虑对话历史和当前查询内容。总结提供的文档块中的所有信息,并可结合与文档块相关的一般知识。**不要包含文档块中未提供的信息**。
---对话历史---
{history}
---文档块(DC)---
{content_data}
---响应指南---
**1. 内容与遵循:**
- 严格遵循知识库提供的上下文,不要捏造、假设或包含源数据中不存在的信息。
- 如果在提供的文档块中找不到答案,请明确说明信息不足,无法回答。
- 确保回答与对话历史保持连续性和一致性。
**2. 格式与语言:**
- 使用 Markdown 格式,并包含适当的章节标题。
- 回答语言应与用户提问语言一致。
- 目标格式和长度:{response_type}
**3. 引用 / 参考文献:**
- 在回答末尾的“References”部分,最多引用 5 个最相关的来源。
- 引用格式如下:`[DC] <文件路径或文档名称>`。
---用户上下文---
- 额外用户提示:{user_prompt}
---回答---
输出:
"""
(4)更加灵活“宽松”的智能体回复(不限制一定得从已有的知识库中回答,可以进行一定的推理):
PROMPTS["naive_rag_response"] = """---Role---
You are a helpful assistant responding to user queries based on the provided Document Chunks in JSON format below.
---Goal---
Generate a concise and informative response based on the Document Chunks and the conversation history. Use information from the Document Chunks whenever available.
If the Document Chunks do not contain sufficient information, you may use your general knowledge or make reasonable assumptions to answer, but clearly indicate which parts are based on assumptions or outside information.
---Conversation History---
{history}
---Document Chunks(DC)---
{content_data}
---RESPONSE GUIDELINES---
**1. Content & Adherence:**
- Prefer using information from the provided Document Chunks.
- When supplementing with general knowledge or assumptions, explicitly indicate that these parts are inferred or general knowledge.
- Ensure continuity and coherence with the conversation history.
**2. Formatting & Language:**
- Format the response using markdown with appropriate headings.
- Response language should match the user's question language.
- Target format and length: {response_type}
**3. Citations / References:**
- At the end of the response, under a "References" section, cite up to 5 most relevant sources from Document Chunks.
- Use the format: `[DC] <file_path_or_document_name>` for sources.
- Do not cite sources for content based on general knowledge or reasonable assumptions; clearly indicate it as such.
---USER CONTEXT---
- Additional user prompt: {user_prompt}
---Response---
Output:
"""
1)使用方法:
# 模拟检索到的文档
content_data = [
{"source": "python_intro.txt", "content": "Python was created by Guido van Rossum in 1991."},
{"source": "python_usage.txt", "content": "Python is widely used in AI and data science."}
]
# 拼接 Prompt
prompt_text = PROMPTS["naive_rag_response"].format(
history="User: Who created Python?\nAssistant: Python was created by Guido van Rossum.",
content_data=content_data,
response_type="medium",
user_prompt="What is Python used for?",
)
"""
预期回复:
Python 是一种由 Guido van Rossum 于 1991 年创建的高级编程语言。[DC] python_intro.txt
(以下内容基于一般常识)
它被广泛用于人工智能、数据分析、Web 开发和自动化脚本等领域。[General Knowledge]
### References
[DC] python_intro.txt
[DC] python_usage.txt
"""
2)对应的中文提示词:
PROMPTS["naive_rag_response"] = """---角色---
你是一名乐于助人的助手,负责根据下方提供的文档块(Document Chunks,JSON 格式)回答用户查询。
---目标---
根据文档块及对话历史生成简明且有信息量的回答。尽可能使用文档块中的信息。
如果文档块信息不足,可结合一般知识或作出合理推测,但需明确标注哪些内容基于推测或额外知识。
---对话历史---
{history}
---文档块(DC)---
{content_data}
---响应指南---
**1. 内容与遵循:**
- 优先使用提供的文档块信息。
- 在使用一般知识或推测内容时,需明确标注其为推测或一般知识。
- 保证回答与对话历史的连续性和逻辑一致性。
**2. 格式与语言:**
- 使用 Markdown 格式,并添加适当的章节标题。
- 回答语言应与用户问题语言一致。
- 目标格式和长度:{response_type}
**3. 引用 / 参考文献:**
- 在回答末尾的“References”部分,最多引用 5 个最相关的文档块来源。
- 引用格式为:`[DC] <文件路径或文档名称>`。
- 对基于一般知识或推测的内容不进行引用,但需明确标注。
---用户上下文---
- 额外用户提示:{user_prompt}
---回答---
输出:
"""
至此,我们已经完成了所有有关LightRAG提示词的拆解探索,之后我们将建立一个合理的简单的框架来使用这些提示词。
魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。
更多推荐

所有评论(0)