前言:mineru更新了2.5版本,新版本中在NPU上可使用VLM性能模式,从sglang引擎更改到了VLLM引擎,从而更具架构兼容性。因此,此文档是说明如何去升级到2.5版本以及相关启动和停止服务。

构建和部署步骤

一、构建镜像

如果要用mineru2.5的话,需要先下载支持它的最新vllm版本,目前最低支持是0.10.1的vllm-ascend,
现在官方的vllm-ascend镜像有v0.10.2rc1版本,用这个作为基础镜像就能运行2.5版本的依赖
pull quay.io/ascend/vllm-ascend:v0.10.2rc1,支持ubuntu和openEular版本。
注意要求:Python >= 3.9,< 3.12
CANN >= 8.2.rc1 (Ascend HDK 版本参考此处)
PyTorch >= 2.7.1,torch-npu >= 2.7.1.dev20250724

vllm-ascend的v0.10.2rc1版本中,对于上面的要求都符合。
相比于我前面写的2.0的版本,用vllm做基础镜像的话,少了很多依赖版本冲突。

二、执行容器

docker run -itd --shm-size=“500g” -p 9005:9005 --privileged
–name vllm
–restart=always
–device=/dev/davinci0
–device=/dev/davinci1
–device=/dev/davinci_manager
–device=/dev/devmm_svm
–device=/dev/hisi_hdc
-v /usr/local/Ascend/driver/:/usr/local/Ascend/driver/
-v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/
-v /usr/local/sbin/:/usr/local/sbin/
-v /var/log/npu/slog/:/var/log/npu/slog
-v /var/log/npu/profiling/:/var/log/npu/profiling
-v /var/log/npu/dump/:/var/log/npu/dump
-v /var/log/npu/:/usr/slog
-v /etc/hccn.conf:/etc/hccn.conf
quay.io/ascend/vllm-ascend:v0.10.2rc1
/bin/bash

三、下载Mineru核心依赖

1、下载依赖:
pip install -U ‘mineru[core]’ -i https://mirrors.aliyun.com/pypi/simple
2、下载模型
mineru-models-download -s modelscope -m all
3、下载相关系统依赖

Centos系统: yum install -y mesa-libGL mesa-libGL-devel libXrender libSM libXext tesseract tesseract-langpack-chi_sim
Ubuntu系统:
apt-get update
# OpenGL & X11 库
apt-get install -y libgl1-mesa-glx libgl1-mesa-dev \
                   libxrender1 libsm6 libxext6
# OCR & 中文语言包
apt-get install -y tesseract-ocr tesseract-ocr-chi-sim

4、调整版本问题
numpy==1.26.4

四、启动服务

①在容器内使用命令启动(这种是fastapi格式):
export MINERU_MODEL_SOURCE=local
mineru-api --host 0.0.0.0 --port 9005 --device npu
②在外部使用compose文件启动(推荐!)

services:
  mineru-vllm-server:
    image: mineru-ascend:latest
    container_name: mineru-vllm-server
    restart: always
    profiles: ["vllm-server"]
    ports:
      - 30000:30000
    environment:
      MINERU_MODEL_SOURCE: local
      ASCEND_RT_VISIBLE_DEVICES: "0,1,2,3"
    entrypoint: mineru-vllm-server
    command:
      --host 0.0.0.0
      --port 30000
      --gpu-memory-utilization 0.85
      --data-parallel-size 4
      # --data-parallel-size 2  # If using multiple GPUs, increase throughput using vllm's multi-GPU parallel mode
      # --gpu-memory-utilization 0.5  # If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by this parameter, if VRAM issues persist, try lowering it further to `0.4` or below.
    ulimits:
      memlock: -1
      stack: 67108864
    ipc: host
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:30000/health || exit 1"]
    devices:
      - /dev/davinci0:/dev/davinci0
      - /dev/davinci0:/dev/davinci1
      - /dev/davinci0:/dev/davinci2
      - /dev/davinci0:/dev/davinci3
      - /dev/davinci_manager:/dev/davinci_manager
      - /dev/devmm_svm:/dev/devmm_svm
      - /dev/hisi_hdc:/dev/hisi_hdc
    volumes:
      - /usr/local/Ascend/driver/:/usr/local/Ascend/driver/
      - /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/
      - /usr/local/sbin/:/usr/local/sbin/
      - /var/log/npu/slog/:/var/log/npu/slog
      - /var/log/npu/profiling/:/var/log/npu/profiling
      - /var/log/npu/dump/:/var/log/npu/dump
      - /var/log/npu/:/usr/slog
      - /etc/hccn.conf:/etc/hccn.conf
    privileged: true

  mineru-api:
    image: mineru-ascend:latest
    container_name: mineru-api
    restart: always
    profiles: ["api"]
    ports:
      - 8000:8000
    environment:
      MINERU_MODEL_SOURCE: local
      ASCEND_RT_VISIBLE_DEVICES: "0"
    entrypoint: mineru-api
    command:
      --host 0.0.0.0
      --port 8000
      # --backend vlm-vllm-async-engine
      # --backend vlm-http-client
      # --server-url http://mineru-vllm-server:30000
      # 可选:若要在同一进程内直接用 vLLM 异步引擎,可改为 --backend vlm-vllm-async-engine
    ulimits:
      memlock: -1
      stack: 67108864
    ipc: host
    devices:
      - /dev/davinci0:/dev/davinci0
      - /dev/davinci_manager:/dev/davinci_manager
      - /dev/devmm_svm:/dev/devmm_svm
      - /dev/hisi_hdc:/dev/hisi_hdc
    volumes:
      - /usr/local/Ascend/driver/:/usr/local/Ascend/driver/
      - /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/
      - /usr/local/sbin/:/usr/local/sbin/
      - /var/log/npu/slog/:/var/log/npu/slog
      - /var/log/npu/profiling/:/var/log/npu/profiling
      - /var/log/npu/dump/:/var/log/npu/dump
      - /var/log/npu/:/usr/slog
      - /etc/hccn.conf:/etc/hccn.conf
    privileged: true

  mineru-gradio:
    image: mineru-ascend:latest
    container_name: mineru-gradio
    restart: always
    profiles: ["gradio"]
    ports:
      - 7860:7860
    environment:
      MINERU_MODEL_SOURCE: local
      ASCEND_RT_VISIBLE_DEVICES: "0"
    entrypoint: mineru-gradio
    command:
      --server-name 0.0.0.0
      --server-port 7860
      --enable-vllm-engine true  # Enable the vllm engine for Gradio
      # --enable-api false  # If you want to disable the API, set this to false
      # --max-convert-pages 20  # If you want to limit the number of pages for conversion, set this to a specific number
      # parameters for vllm-engine
      # --data-parallel-size 2  # If using multiple GPUs, increase throughput using vllm's multi-GPU parallel mode
      # --gpu-memory-utilization 0.5  # If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by this parameter, if VRAM issues persist, try lowering it further to `0.4` or below.
    ulimits:
      memlock: -1
      stack: 67108864
    ipc: host
    devices:
      - /dev/davinci0:/dev/davinci0
      - /dev/davinci_manager:/dev/davinci_manager
      - /dev/devmm_svm:/dev/devmm_svm
      - /dev/hisi_hdc:/dev/hisi_hdc
    volumes:
      - /usr/local/Ascend/driver/:/usr/local/Ascend/driver/
      - /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/
      - /usr/local/sbin/:/usr/local/sbin/
      - /var/log/npu/slog/:/var/log/npu/slog
      - /var/log/npu/profiling/:/var/log/npu/profiling
      - /var/log/npu/dump/:/var/log/npu/dump
    privileged: true

上述文档中的服务是官方的api服务和vllm服务。
相关命令:
docker-compose -f docker-compose-npu.yaml --profile api up -d
docker-compose -f docker-compose-npu.yaml --profile api --profile vllm-server up -d

五、进程内调用vllm推理模式

curl --location 'http://127.0.0.1:8000/file_parse' \
--form 'files=@"./demo1.pdf"' \
--form 'backend="vlm-http-client"' \
--form 'server_url=http://mineru-vllm-server:30000' \
--form 'parse_method="auto"' \
--form 'formula_enable="true"' \
--form 'table_enable="true"' \
--form 'start_page_id="0"' 

注意:这里用的是VLM性能模式。backend模式可选,可支持多种模式解析。好啦,就到这里了,有啥问题评论区留言,作者不定期会看文章并且回复的

Logo

魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。

更多推荐