从 0 到 1 构建高效 RAG 问答系统：LangChain+Ollama+Chroma 实战指南

组件优势适用场景Ollama本地部署、模型丰富中小型企业 / 个人开发者Chroma轻量级、支持 HTTP 接口快速原型开发LangChain标准化组件、生态完善复杂 RAG 系统开发。

The_Thieves

2069人浏览 · 2025-04-06 13:15:03

The_Thieves · 2025-04-06 13:15:03 发布

一、引言：突破 LLM 知识边界的 RAG 技术

在企业级 AI 应用中，大语言模型（LLM）常面临两大痛点：知识时效性不足（如训练数据截止到 2023 年）和领域知识缺失（如企业内部文档未被模型学习）。检索增强生成（RAG）技术通过将外部知识库与 LLM 结合，实现了 "模型推理 + 事实检索" 的闭环，显著提升回答的准确性和可信度。

本文将以LangChain为框架，结合Ollama轻量级模型和Chroma向量数据库，从零搭建一个支持本地知识库的 RAG 问答系统。代码经过实战验证，并包含模块迁移避坑指南和性能优化技巧，适合 AI 开发者和技术爱好者快速上手。

二、环境搭建与依赖配置

1. 核心组件选择

模型层：Ollama（支持本地部署，兼容 Qwen、Llama 等开源模型）
向量数据库：Chroma（轻量级、易部署，支持 HTTP 接口）
框架：LangChain（提供 RAG 组件的标准化接口）

2. 依赖安装

bash

# 基础框架
pip install langchain-core langchain-community

# 模型与向量库
pip install -U langchain-ollama langchain-chroma chromadb

# 文档处理
pip install langchain-text-splitters python-multipart

3. 模型准备

bash

# 启动Ollama服务
ollama serve

# 拉取模型（根据硬件选择）
ollama pull qwen2.5:0.5b  # 轻量级模型（推荐Mac/CPU）
ollama pull qwen2:7b       # 中量级模型（需GPU支持）

三、RAG 系统核心实现

1. 文档加载与预处理

python

# 加载指定目录下的所有txt文件
loader = DirectoryLoader(
    "/path/to/knowledge_base",
    glob="*.txt",
    loader_cls=TextLoader
)
documents = loader.load()

# 文本分割（平衡上下文连贯性与token消耗）
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=20,
    length_function=len
)
splits = text_splitter.split_documents(documents)

2. 向量数据库初始化

python

# 连接Chroma服务
chroma = chromadb.HttpClient(host="localhost", port=8000)

# 创建/清空索引
chroma.delete_collection(name="ragdb")
collection = chroma.get_or_create_collection(
    name="ragdb",
    metadata={"hnsw:space": "cosine"}  # 使用余弦相似度
)

# 构建向量存储
db = Chroma(
    client=chroma,
    collection_name="ragdb",
    embedding_function=OllamaEmbeddings(model="nomic-embed-text:latest")
)
db.add_documents(splits)

3. RAG 链构建与交互

python

# 从LangChain Hub拉取RAG模板
prompt = hub.pull("rlm/rag-prompt")

# 定义RAG流程
rag_chain = (
    {
        "context": db.as_retriever() | (lambda docs: "\n\n".join(doc.page_content for doc in docs)),
        "question": RunnablePassthrough()
    }
    | prompt
    | Ollama(model="qwen2.5:0.5b")
    | StrOutputParser()
)

# 交互循环
while True:
    user_input = input("问题：")
    if user_input.lower() == 'exit':
        break
    response = rag_chain.invoke(user_input)
    print("AI助手：", response)

四、常见问题与解决方案

1. 模块迁移报错

python

# 原导入方式（错误）
from langchain_community.llms import Ollama
from langchain_community.vectorstores import Chroma

# 新导入方式（正确）
from langchain_ollama import OllamaLLM as Ollama
from langchain_chroma import Chroma

2. LangSmith API 密钥警告

bash

# 设置环境变量（永久生效）
echo "export LANGCHAIN_API_KEY=your_api_key" >> ~/.bash_profile
source ~/.bash_profile

3. Chroma 数据持久化

python

# 使用持久化客户端
chroma = chromadb.PersistentClient(path="/path/to/chromadb_data")

五、性能优化与高级技巧

1. 检索器优化

python

# 启用混合检索（向量+BM25）
retriever = db.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 5, "fetch_k": 10}
)

2. 上下文压缩

python

# 使用LLM生成摘要
from langchain.chains.summarize import load_summarize_chain

summarizer = load_summarize_chain(
    llm=Ollama(model="qwen2.5:0.5b"),
    chain_type="map_reduce"
)

context = summarizer.run(docs)

3. 多模态支持

python

# 加载图片（需安装Pillow）
from langchain_community.document_loaders import UnstructuredImageLoader

image_loader = UnstructuredImageLoader("/path/to/image.png")
image_docs = image_loader.load()

六、扩展应用场景

1. 企业级知识库

支持 PDF/Word/CSV 等格式（需更换 Loader）
增加用户权限控制（通过 Chroma 元数据）

2. 实时问答系统

python

# 结合搜索引擎（需安装SerpAPI）
from langchain_community.document_loaders import SerpAPILoader

loader = SerpAPILoader(query="2024年北京房价趋势")
web_docs = loader.load()

3. 代码问答

python

# 加载代码仓库（需安装GitPython）
from langchain_community.document_loaders import GitLoader

loader = GitLoader(
    repo_path="https://github.com/langchain-ai/langchain",
    branch="main",
    file_filter=lambda file_path: file_path.endswith(".py")
)
code_docs = loader.load()

七、总结与资源推荐

关键组件对比

组件	优势	适用场景
Ollama	本地部署、模型丰富	中小型企业 / 个人开发者
Chroma	轻量级、支持 HTTP 接口	快速原型开发
LangChain	标准化组件、生态完善	复杂 RAG 系统开发

性能对比测试

模型	上下文窗口	生成速度（token/s）	内存占用
qwen2.5:0.5b	2048	15-20	1.2GB
qwen2:7b	8192	5-8	7GB

八、常见问题 Q&A

Q：如何处理中文文档？
A：使用RecursiveCharacterTextSplitter并设置is_separator_regex=True。

Q：如何调试 RAG 链？
A：启用 LangSmith 跟踪，在代码中添加：

python

import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"

Q：向量数据库如何选择？
A：- 轻量级：Chroma（本地）/Pinecone（云）

高性能：Milvus（开源）/Zilliz Cloud（企业级）

通过本文的实战指南，你可以快速搭建一个基于本地知识库的 RAG 问答系统，并通过优化策略提升系统性能。欢迎在评论区分享你的实践经验或提出改进建议！

魔乐社区

魔乐社区（Modelers.cn) 是一个中立、公益的人工智能社区，提供人工智能工具、模型、数据的托管、展示与应用协同服务，为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作，由全产业链共同建设、共同运营、共同享有，推动国产AI生态繁荣发展。

更多推荐

全家桶集齐！Qwen3.5四款小模型上线魔乐社区，附昇腾全套实践教程

魔乐社区

Pont - 搭建前后端之桥：高效、灵活的接口管理工具

Pont 是一款强大的数据服务层解决方案，它能够帮助开发者快速搭建前后端之间的桥梁，实现接口的高效管理和代码自动生成。无论是新手还是有经验的开发者，都能通过 Pont 轻松处理接口文档、生成类型安全的 API 代码，从而显著提升开发效率。[![Pont 工具标志](https://raw.gitcode.com/gh_mirrors/po/pont/raw/3f1b7d4bbba3fd2dda