【新版本更新】Mineru2.5基于华为昇腾 310 系列、910B系列本地化部署Mineru2.5开源项目,性能飙升十倍??
本文档介绍了如何将Mineru升级到2.5版本并部署相关服务。新版本改用VLLM引擎提升NPU兼容性,需基于v0.10.2rc1版本的vllm-ascend镜像构建。部署步骤包括:1)拉取基础镜像;2)启动容器配置设备;3)安装Mineru核心依赖和模型;4)通过命令或Compose文件启动服务。文档提供了完整的Docker Compose配置示例,支持多设备并行和健康检查,并详细说明了GPU内存
前言:mineru更新了2.5版本,新版本中在NPU上可使用VLM性能模式,从sglang引擎更改到了VLLM引擎,从而更具架构兼容性。因此,此文档是说明如何去升级到2.5版本以及相关启动和停止服务。
构建和部署步骤
一、构建镜像
如果要用mineru2.5的话,需要先下载支持它的最新vllm版本,目前最低支持是0.10.1的vllm-ascend,
现在官方的vllm-ascend镜像有v0.10.2rc1版本,用这个作为基础镜像就能运行2.5版本的依赖
pull quay.io/ascend/vllm-ascend:v0.10.2rc1,支持ubuntu和openEular版本。
注意要求:Python >= 3.9,< 3.12
CANN >= 8.2.rc1 (Ascend HDK 版本参考此处)
PyTorch >= 2.7.1,torch-npu >= 2.7.1.dev20250724
vllm-ascend的v0.10.2rc1版本中,对于上面的要求都符合。
相比于我前面写的2.0的版本,用vllm做基础镜像的话,少了很多依赖版本冲突。
二、执行容器
docker run -itd --shm-size=“500g” -p 9005:9005 --privileged
–name vllm
–restart=always
–device=/dev/davinci0
–device=/dev/davinci1
–device=/dev/davinci_manager
–device=/dev/devmm_svm
–device=/dev/hisi_hdc
-v /usr/local/Ascend/driver/:/usr/local/Ascend/driver/
-v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/
-v /usr/local/sbin/:/usr/local/sbin/
-v /var/log/npu/slog/:/var/log/npu/slog
-v /var/log/npu/profiling/:/var/log/npu/profiling
-v /var/log/npu/dump/:/var/log/npu/dump
-v /var/log/npu/:/usr/slog
-v /etc/hccn.conf:/etc/hccn.conf
quay.io/ascend/vllm-ascend:v0.10.2rc1
/bin/bash
三、下载Mineru核心依赖
1、下载依赖:
pip install -U ‘mineru[core]’ -i https://mirrors.aliyun.com/pypi/simple
2、下载模型
mineru-models-download -s modelscope -m all
3、下载相关系统依赖
Centos系统: yum install -y mesa-libGL mesa-libGL-devel libXrender libSM libXext tesseract tesseract-langpack-chi_sim
Ubuntu系统:
apt-get update
# OpenGL & X11 库
apt-get install -y libgl1-mesa-glx libgl1-mesa-dev \
libxrender1 libsm6 libxext6
# OCR & 中文语言包
apt-get install -y tesseract-ocr tesseract-ocr-chi-sim
4、调整版本问题
numpy==1.26.4
四、启动服务
①在容器内使用命令启动(这种是fastapi格式):
export MINERU_MODEL_SOURCE=local
mineru-api --host 0.0.0.0 --port 9005 --device npu
②在外部使用compose文件启动(推荐!)
services:
mineru-vllm-server:
image: mineru-ascend:latest
container_name: mineru-vllm-server
restart: always
profiles: ["vllm-server"]
ports:
- 30000:30000
environment:
MINERU_MODEL_SOURCE: local
ASCEND_RT_VISIBLE_DEVICES: "0,1,2,3"
entrypoint: mineru-vllm-server
command:
--host 0.0.0.0
--port 30000
--gpu-memory-utilization 0.85
--data-parallel-size 4
# --data-parallel-size 2 # If using multiple GPUs, increase throughput using vllm's multi-GPU parallel mode
# --gpu-memory-utilization 0.5 # If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by this parameter, if VRAM issues persist, try lowering it further to `0.4` or below.
ulimits:
memlock: -1
stack: 67108864
ipc: host
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:30000/health || exit 1"]
devices:
- /dev/davinci0:/dev/davinci0
- /dev/davinci0:/dev/davinci1
- /dev/davinci0:/dev/davinci2
- /dev/davinci0:/dev/davinci3
- /dev/davinci_manager:/dev/davinci_manager
- /dev/devmm_svm:/dev/devmm_svm
- /dev/hisi_hdc:/dev/hisi_hdc
volumes:
- /usr/local/Ascend/driver/:/usr/local/Ascend/driver/
- /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/
- /usr/local/sbin/:/usr/local/sbin/
- /var/log/npu/slog/:/var/log/npu/slog
- /var/log/npu/profiling/:/var/log/npu/profiling
- /var/log/npu/dump/:/var/log/npu/dump
- /var/log/npu/:/usr/slog
- /etc/hccn.conf:/etc/hccn.conf
privileged: true
mineru-api:
image: mineru-ascend:latest
container_name: mineru-api
restart: always
profiles: ["api"]
ports:
- 8000:8000
environment:
MINERU_MODEL_SOURCE: local
ASCEND_RT_VISIBLE_DEVICES: "0"
entrypoint: mineru-api
command:
--host 0.0.0.0
--port 8000
# --backend vlm-vllm-async-engine
# --backend vlm-http-client
# --server-url http://mineru-vllm-server:30000
# 可选:若要在同一进程内直接用 vLLM 异步引擎,可改为 --backend vlm-vllm-async-engine
ulimits:
memlock: -1
stack: 67108864
ipc: host
devices:
- /dev/davinci0:/dev/davinci0
- /dev/davinci_manager:/dev/davinci_manager
- /dev/devmm_svm:/dev/devmm_svm
- /dev/hisi_hdc:/dev/hisi_hdc
volumes:
- /usr/local/Ascend/driver/:/usr/local/Ascend/driver/
- /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/
- /usr/local/sbin/:/usr/local/sbin/
- /var/log/npu/slog/:/var/log/npu/slog
- /var/log/npu/profiling/:/var/log/npu/profiling
- /var/log/npu/dump/:/var/log/npu/dump
- /var/log/npu/:/usr/slog
- /etc/hccn.conf:/etc/hccn.conf
privileged: true
mineru-gradio:
image: mineru-ascend:latest
container_name: mineru-gradio
restart: always
profiles: ["gradio"]
ports:
- 7860:7860
environment:
MINERU_MODEL_SOURCE: local
ASCEND_RT_VISIBLE_DEVICES: "0"
entrypoint: mineru-gradio
command:
--server-name 0.0.0.0
--server-port 7860
--enable-vllm-engine true # Enable the vllm engine for Gradio
# --enable-api false # If you want to disable the API, set this to false
# --max-convert-pages 20 # If you want to limit the number of pages for conversion, set this to a specific number
# parameters for vllm-engine
# --data-parallel-size 2 # If using multiple GPUs, increase throughput using vllm's multi-GPU parallel mode
# --gpu-memory-utilization 0.5 # If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by this parameter, if VRAM issues persist, try lowering it further to `0.4` or below.
ulimits:
memlock: -1
stack: 67108864
ipc: host
devices:
- /dev/davinci0:/dev/davinci0
- /dev/davinci_manager:/dev/davinci_manager
- /dev/devmm_svm:/dev/devmm_svm
- /dev/hisi_hdc:/dev/hisi_hdc
volumes:
- /usr/local/Ascend/driver/:/usr/local/Ascend/driver/
- /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/
- /usr/local/sbin/:/usr/local/sbin/
- /var/log/npu/slog/:/var/log/npu/slog
- /var/log/npu/profiling/:/var/log/npu/profiling
- /var/log/npu/dump/:/var/log/npu/dump
privileged: true
上述文档中的服务是官方的api服务和vllm服务。
相关命令:
docker-compose -f docker-compose-npu.yaml --profile api up -d
docker-compose -f docker-compose-npu.yaml --profile api --profile vllm-server up -d
五、进程内调用vllm推理模式
curl --location 'http://127.0.0.1:8000/file_parse' \
--form 'files=@"./demo1.pdf"' \
--form 'backend="vlm-http-client"' \
--form 'server_url=http://mineru-vllm-server:30000' \
--form 'parse_method="auto"' \
--form 'formula_enable="true"' \
--form 'table_enable="true"' \
--form 'start_page_id="0"'
注意:这里用的是VLM性能模式。backend模式可选,可支持多种模式解析。好啦,就到这里了,有啥问题评论区留言,作者不定期会看文章并且回复的
魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。
更多推荐


所有评论(0)