0day同步！昇腾成功适配智源Emu3.5模型并上线魔乐社区

weixin_45524668

38人浏览 · 2026-01-20 18:59:30

weixin_45524668 · 2026-01-20 18:59:30 发布

01 模型介绍

Emu3.5: Native Multimodal Models are World Learners

Emu3.5 Team, BAAI

Project Page | 🤗HF Models | Paper

🔹	Core Concept	Description
🧠	Unified World Modeling	Predicts the next state jointly across vision and language, enabling coherent world modeling and generation.
🧩	End-to-End Pretraining	Trained with a unified next-token prediction objective over interleaved vision–language sequences.
📚	Over 10T+ Multimodal Tokens	Pre-trained on over 10 trillion interleaved tokens from video frames and transcripts, capturing spatiotemporal structure.
🔄	Native Multimodal I/O	Processes and generates interleaved visual–text sequences without modality adapters or task-specific heads.
🎯	RL Post-Training	Large-scale reinforcement learning enhances reasoning, compositionality, and generation quality.
⚡	Discrete Diffusion Adaptation (DiDA)	Converts sequential decoding → bidirectional parallel prediction, achieving ≈20× faster inference without performance loss.
🖼️	Versatile Generation	Excels in long-horizon vision–language generation, any-to-image (X2I) synthesis, and text-rich image creation.
🌐	Generalizable World Modeling	Enables spatiotemporally consistent world exploration, and open-world embodied manipulation across diverse scenarios.
🏆	Performance Benchmark	Matches Gemini 2.5 Flash Image (Nano Banana) on image generation/editing, and outperforms on interleaved generation tasks.

02 快速开始

1 准备资源

硬件资源：Atlas 800I/800T A2 (64G)或者Atlas 800I/800T A3

执行以下 Shell 命令，拉取vllm-ascned推理容器镜像：
A2

docker pull quay.nju.edu.cn/ascend/vllm-ascend:v0.11.0rc3

docker pull quay.nju.edu.cn/ascend/vllm-ascend:v0.11.0rc3-a3

2 下载权重

权重名称	魔乐社区下载地址
Emu3.5	link
Emu3.5-Image	link
Emu3.5-VisionTokenizer	link

3 使用transformers推理

（1）创建并进入容器

docker run -it --net=host --shm-size=500g \
    --privileged \
    --name emu3.5 \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \
    -v /usr/local/sbin:/usr/local/sbin:ro \
    -v /data:/data \
    quay.nju.edu.cn/ascend/vllm-ascend:v0.11.0rc3 bash

docker run -it --net=host --shm-size=500g \
    --privileged \
    --name emu3.5 \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \
    -v /usr/local/sbin:/usr/local/sbin:ro \
    -v /data:/data \
    quay.nju.edu.cn/ascend/vllm-ascend:v0.11.0rc3-a3 bash

（2）环境搭建

cd /vllm-workspace/
git clone https://github.com/baaivision/Emu3.5.git
cd Emu3.5/
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
pip install -r requirements/common.txt
pip install accelerate
pip install transformers==4.48.2

修改src/utils/model_utils.py文件中第38行

attn_implementation="flash_attention_2",
改为
attn_implementation="eager",

（3）执行推理

example_config_*.py为不同任务参考配置文件。使用前修改配置文件中的

model_path = "./weights/Emu3.5-Image" 改为实际路径
vq_path = "./weights/Emu3.5-VisionTokenizer" # 改为实际路径
vq_device = "cuda:0" 改为 vq_device = "npu:0"

推理命令

# 🖼️ Text-to-Image (T2I) task
ASCEND_RT_VISIBLE_DEVICES=0,1 python inference.py --cfg configs/example_config_t2i.py

# 🔄 Any-to-Image (X2I) task
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 python inference.py --cfg configs/example_config_x2i.py

# 🎯 Visual Guidance task
ASCEND_RT_VISIBLE_DEVICES=0,1 python inference.py --cfg configs/example_config_visual_guidance.py

# 📖 Visual Narrative task
ASCEND_RT_VISIBLE_DEVICES=0,1 python inference.py --cfg configs/example_config_visual_narrative.py
# After running inference, the model will generate results in protobuf format (.pb files) for each input prompt.

4 使用vLLM推理

（1）创建并进入容器

docker run -it --net=host --shm-size=500g \
    --privileged \
    --name emu3.5-vllm \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \
    -v /usr/local/sbin:/usr/local/sbin:ro \
    -v /data:/data \
    quay.nju.edu.cn/ascend/vllm-ascend:v0.11.0rc3 bash

docker run -it --net=host --shm-size=500g \
    --privileged \
    --name emu3.5-vllm \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \
    -v /usr/local/sbin:/usr/local/sbin:ro \
    -v /data:/data \
    quay.nju.edu.cn/ascend/vllm-ascend:v0.11.0rc3-a3 bash

（2）环境搭建

cd /vllm-workspace/
git clone https://github.com/baaivision/Emu3.5.git
cd Emu3.5/
python src/patch/apply.py
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
pip install -r requirements/common.txt

修改代码

（1）/vllm-workspace/vllm/vllm/v1/core/sched/scheduler.py的701行做以下修改

return CachedRequestData(
          req_ids=req_ids,
          resumed_from_preemption=resumed_from_preemption,
          new_token_ids=new_token_ids,
          new_block_ids=new_block_ids,
          num_computed_tokens=num_computed_tokens,
      )

改为

return CachedRequestData(
          req_ids=req_ids,
          resumed_from_preemption=resumed_from_preemption,
          new_token_ids=new_token_ids,
          new_block_ids=new_block_ids,
          num_computed_tokens=num_computed_tokens,
          sampling_params=None,
          hybrid_metadata=None
      )

（2）/vllm-workspace/vllm/vllm/v1/sample/logits_processor/builtin.py第58行

for index, params, _, _, _ in batch_update.added:
改为
for index, params, _, _ in batch_update.added:

（3）/vllm-workspace/vllm/vllm/v1/sample/logits_processor/builtin.py第249行

for index, params, prompt_tok_ids, output_tok_ids, _ in batch_update.added
改为
for index, params, prompt_tok_ids, output_tok_ids in batch_update.added

（4）src/utils/model_utils.py中第131行
注释第131行内容 "full_cuda_graph": True, ，在131行之后添加"cudagraph_mode":"FULL_DECODE_ONLY",，如下：

compilation_config={
            "full_cuda_graph": True,
            "backend": "cudagraph",
            "cudagraph_capture_sizes": [1, 2],
        },
改为
compilation_config={
            #"full_cuda_graph": True,
            "cudagraph_mode":"FULL_DECODE_ONLY",
            "backend": "cudagraph",
            "cudagraph_capture_sizes": [1, 2],
        },

（3）执行推理

设置环境变量

export VLLM_WORKER_MULTIPROC_METHOD="spawn"
export TASK_QUEUE_ENABLE=1
export CPU_AFFINITY_CONF=2
export PYTORCH_NPU_ALLOC_CONF="max_split_size_mb:250"
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
export HCCL_OP_EXPANSION_MODE="AIV"

修改配置文件

example_config_*.py为不同任务参考配置文件。使用前修改配置文件中的

model_path = "./weights/Emu3.5-Image" 改为实际路径
vq_path = "./weights/Emu3.5-VisionTokenizer" # 改为实际路径
vq_device = "cuda:0" 改为 vq_device = "npu:0"

推理命令

# 🖼️ Text-to-Image (T2I) task
ASCEND_RT_VISIBLE_DEVICES=0,1 python inference_vllm.py --cfg configs/example_config_t2i.py --tensor-parallel-size 2 --gpu-memory-utilization 0.8

# 🔄 Any-to-Image (X2I) task
ASCEND_RT_VISIBLE_DEVICES=0,1 python inference_vllm.py --cfg configs/example_config_x2i.py --tensor-parallel-size 2 --gpu-memory-utilization 0.7

# 🎯 Visual Guidance task
ASCEND_RT_VISIBLE_DEVICES=0,1 python inference_vllm.py --cfg configs/example_config_visual_guidance.py --tensor-parallel-size 2 --gpu-memory-utilization 0.8

# 📖 Visual Narrative task
ASCEND_RT_VISIBLE_DEVICES=0,1 python inference_vllm.py --cfg configs/example_config_visual_narrative.py --tensor-parallel-size 2 --gpu-memory-utilization 0.8
# After running inference, the model will generate results in protobuf format (.pb files) for each input prompt.

5 转换推理结果

执行以下命令将推理结果进行可视化转换

python src/utils/vis_proto.py --input <input_proto_path> [--output <output_dir>] [--video]

6. Gradio Demo

We provide two Gradio Demos for different application scenarios:

Emu3.5-Image Demo —— Interactive interface optimized for Text-to-Image (T2I) and Any-to-Image (X2I) tasks:

ASCEND_RT_VISIBLE_DEVICES=0,1 python gradio_demo_image.py --host 0.0.0.0 --port 7860

Emu3.5-Interleave Demo —— Launch Emu3.5 Interleave Tasks (Visual Guidance and Visual Narrate) Gradio Demo

ASCEND_RT_VISIBLE_DEVICES=0,1 python gradio_demo_interleave.py --host 0.0.0.0 --port 7860

03 量化

使用modelslim量化工具量化

1 modelslim源码

git clone https://gitcode.com/Ascend/msit.git

2 EMU3.5模型接入

（1）实现模型量化代码

cd msit/msmodelslim/msmodelslim/model
mkdir emu3_5
cd emu3_5
cp -r Emu3.5源码路径/src/tokenizer_emu3_ibq/ ./
cp -r Emu3.5源码路径/src/tokenizer_emu3_ibq/ ./
cp -r Emu3.5源码路径/Emu3.5/src/emu3p5/modeling_emu3.py ./
cp -r Emu3.5源码路径/Emu3.5/src/emu3p5/configuration_emu3.py ./
将 __init__.py和model_adapter.py拷贝到emu3_5目录下

（2）添加量化配置文件

vim quant.yaml，添加以下内容

apiversion: modelslim_v1
spec:
  process:
    - type: "iter_smooth"                   
      alpha: 0.9                           
      scale_min: 1e-5                      
      symmetric: True                        
      enable_subgraph_type: 
        - 'norm-linear'
        - 'linear-linear'
        - 'ov'
        - 'up-down'
      include: 
        - "*"
      exclude:                               
        - "*self_attn*"
    - type: "quarot"
      online: False
      block_size: -1
      max_tp_size: 4
      down_proj_online_layers: [ ]
    - type: "linear_quant" # 线性层量化
      qconfig:
        act: # 激活值量化
          scope: "per_token" # 动态量化
          dtype: "int8" # 8比特整数量化      
          symmetric: True # 对称量化
          method: "minmax" # 使用minmax算法
        weight: # 权重量化
          scope: "per_channel" # per_channel量化
          dtype: "int8" # 8比特整数量化
          symmetric: True # 对称量化      
          method: "minmax" # 使用minmax算法     
      include: [ "*" ] # 全局w8a8动态量化
      exclude: [ "*down_proj*" ] # 回退down_proj层
  save:
    - type: "ascendv1_saver"
      part_file_size: 4 # 每个safetensors权重文件最大4G`

（3）注册模型

在配置文件config.ini中注册模型名称
在ModelAdapter中的注册emu3.5模型，en3对应下面的Qwen3ModelAdapter模型适配器

emu3_5 = EMU3.5, Emu3.5-Image

在ModelAdapterEntryPoints中添加模型适配器

emu3_5 = msmodelslim.model.emu3_5.model_adapter:Emu35ModelAdapter

3 安装modelslim

cd msit/msmodelslim
bash install.sh

4 执行量化

# tokenizer_emu3_ibq 路径
export TOKENIZER_PATH= ~/msit/msmodelslim/msmodelslim/model/emu3_5/tokenizer_emu3_ibq 
# 一键量化配置文件路径
yaml_config=msit/msmodelslim/msmodelslim/model/emu3_5/quant.yaml
# Emu3.5-Image原始权重路径
model_path=BAAI/Emu3.5-Image/
# 量化后权重保存路径
save_path=BAAI/Emu3.5-Image-w8a8
msmodelslim quant --model_path $model_path \
                  --save_path $save_path \
                  --device npu \
                  --model_type EMU3.5 \
                  --config_path  $yaml_config \
                  --trust_remote_code False