0day同步!昇腾成功适配智源Emu3.5模型并上线魔乐社区
01 模型介绍
Emu3.5: Native Multimodal Models are World Learners
Emu3.5 Team, BAAI
| 🔹 | Core Concept | Description |
|---|---|---|
| 🧠 | Unified World Modeling | Predicts the next state jointly across vision and language, enabling coherent world modeling and generation. |
| 🧩 | End-to-End Pretraining | Trained with a unified next-token prediction objective over interleaved vision–language sequences. |
| 📚 | Over 10T+ Multimodal Tokens | Pre-trained on over 10 trillion interleaved tokens from video frames and transcripts, capturing spatiotemporal structure. |
| 🔄 | Native Multimodal I/O | Processes and generates interleaved visual–text sequences without modality adapters or task-specific heads. |
| 🎯 | RL Post-Training | Large-scale reinforcement learning enhances reasoning, compositionality, and generation quality. |
| ⚡ | Discrete Diffusion Adaptation (DiDA) | Converts sequential decoding → bidirectional parallel prediction, achieving ≈20× faster inference without performance loss. |
| 🖼️ | Versatile Generation | Excels in long-horizon vision–language generation, any-to-image (X2I) synthesis, and text-rich image creation. |
| 🌐 | Generalizable World Modeling | Enables spatiotemporally consistent world exploration, and open-world embodied manipulation across diverse scenarios. |
| 🏆 | Performance Benchmark | Matches Gemini 2.5 Flash Image (Nano Banana) on image generation/editing, and outperforms on interleaved generation tasks. |
02 快速开始
1 准备资源
硬件资源:Atlas 800I/800T A2 (64G)或者Atlas 800I/800T A3
执行以下 Shell 命令,拉取vllm-ascned推理容器镜像:
A2
docker pull quay.nju.edu.cn/ascend/vllm-ascend:v0.11.0rc3
A3
docker pull quay.nju.edu.cn/ascend/vllm-ascend:v0.11.0rc3-a3
2 下载权重
| 权重名称 | 魔乐社区下载地址 |
|---|---|
| Emu3.5 | link |
| Emu3.5-Image | link |
| Emu3.5-VisionTokenizer | link |
3 使用transformers推理
(1) 创建并进入容器
A2
docker run -it --net=host --shm-size=500g \
--privileged \
--name emu3.5 \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \
-v /usr/local/sbin:/usr/local/sbin:ro \
-v /data:/data \
quay.nju.edu.cn/ascend/vllm-ascend:v0.11.0rc3 bash
A3
docker run -it --net=host --shm-size=500g \
--privileged \
--name emu3.5 \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \
-v /usr/local/sbin:/usr/local/sbin:ro \
-v /data:/data \
quay.nju.edu.cn/ascend/vllm-ascend:v0.11.0rc3-a3 bash
(2) 环境搭建
cd /vllm-workspace/
git clone https://github.com/baaivision/Emu3.5.git
cd Emu3.5/
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
pip install -r requirements/common.txt
pip install accelerate
pip install transformers==4.48.2
修改src/utils/model_utils.py文件中第38行
attn_implementation="flash_attention_2",
改为
attn_implementation="eager",
(3) 执行推理
example_config_*.py为不同任务参考配置文件。使用前修改配置文件中的
model_path = "./weights/Emu3.5-Image" 改为实际路径
vq_path = "./weights/Emu3.5-VisionTokenizer" # 改为实际路径
vq_device = "cuda:0" 改为 vq_device = "npu:0"
推理命令
# 🖼️ Text-to-Image (T2I) task
ASCEND_RT_VISIBLE_DEVICES=0,1 python inference.py --cfg configs/example_config_t2i.py
# 🔄 Any-to-Image (X2I) task
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 python inference.py --cfg configs/example_config_x2i.py
# 🎯 Visual Guidance task
ASCEND_RT_VISIBLE_DEVICES=0,1 python inference.py --cfg configs/example_config_visual_guidance.py
# 📖 Visual Narrative task
ASCEND_RT_VISIBLE_DEVICES=0,1 python inference.py --cfg configs/example_config_visual_narrative.py
# After running inference, the model will generate results in protobuf format (.pb files) for each input prompt.
4 使用vLLM推理
(1) 创建并进入容器
A2
docker run -it --net=host --shm-size=500g \
--privileged \
--name emu3.5-vllm \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \
-v /usr/local/sbin:/usr/local/sbin:ro \
-v /data:/data \
quay.nju.edu.cn/ascend/vllm-ascend:v0.11.0rc3 bash
A3
docker run -it --net=host --shm-size=500g \
--privileged \
--name emu3.5-vllm \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \
-v /usr/local/sbin:/usr/local/sbin:ro \
-v /data:/data \
quay.nju.edu.cn/ascend/vllm-ascend:v0.11.0rc3-a3 bash
(2) 环境搭建
cd /vllm-workspace/
git clone https://github.com/baaivision/Emu3.5.git
cd Emu3.5/
python src/patch/apply.py
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
pip install -r requirements/common.txt
修改代码
- (1)/vllm-workspace/vllm/vllm/v1/core/sched/scheduler.py的701行做以下修改
改为return CachedRequestData( req_ids=req_ids, resumed_from_preemption=resumed_from_preemption, new_token_ids=new_token_ids, new_block_ids=new_block_ids, num_computed_tokens=num_computed_tokens, )return CachedRequestData( req_ids=req_ids, resumed_from_preemption=resumed_from_preemption, new_token_ids=new_token_ids, new_block_ids=new_block_ids, num_computed_tokens=num_computed_tokens, sampling_params=None, hybrid_metadata=None ) - (2)/vllm-workspace/vllm/vllm/v1/sample/logits_processor/builtin.py第58行
for index, params, _, _, _ in batch_update.added: 改为 for index, params, _, _ in batch_update.added: - (3)/vllm-workspace/vllm/vllm/v1/sample/logits_processor/builtin.py第249行
for index, params, prompt_tok_ids, output_tok_ids, _ in batch_update.added 改为 for index, params, prompt_tok_ids, output_tok_ids in batch_update.added - (4)src/utils/model_utils.py中第131行
注释第131行内容"full_cuda_graph": True,,在131行之后添加"cudagraph_mode":"FULL_DECODE_ONLY",,如下:
compilation_config={
"full_cuda_graph": True,
"backend": "cudagraph",
"cudagraph_capture_sizes": [1, 2],
},
改为
compilation_config={
#"full_cuda_graph": True,
"cudagraph_mode":"FULL_DECODE_ONLY",
"backend": "cudagraph",
"cudagraph_capture_sizes": [1, 2],
},
(3) 执行推理
- 设置环境变量
export VLLM_WORKER_MULTIPROC_METHOD="spawn" export TASK_QUEUE_ENABLE=1 export CPU_AFFINITY_CONF=2 export PYTORCH_NPU_ALLOC_CONF="max_split_size_mb:250" export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True" export HCCL_OP_EXPANSION_MODE="AIV" - 修改配置文件
example_config_*.py为不同任务参考配置文件。使用前修改配置文件中的
model_path = "./weights/Emu3.5-Image" 改为实际路径
vq_path = "./weights/Emu3.5-VisionTokenizer" # 改为实际路径
vq_device = "cuda:0" 改为 vq_device = "npu:0"
- 推理命令
# 🖼️ Text-to-Image (T2I) task
ASCEND_RT_VISIBLE_DEVICES=0,1 python inference_vllm.py --cfg configs/example_config_t2i.py --tensor-parallel-size 2 --gpu-memory-utilization 0.8
# 🔄 Any-to-Image (X2I) task
ASCEND_RT_VISIBLE_DEVICES=0,1 python inference_vllm.py --cfg configs/example_config_x2i.py --tensor-parallel-size 2 --gpu-memory-utilization 0.7
# 🎯 Visual Guidance task
ASCEND_RT_VISIBLE_DEVICES=0,1 python inference_vllm.py --cfg configs/example_config_visual_guidance.py --tensor-parallel-size 2 --gpu-memory-utilization 0.8
# 📖 Visual Narrative task
ASCEND_RT_VISIBLE_DEVICES=0,1 python inference_vllm.py --cfg configs/example_config_visual_narrative.py --tensor-parallel-size 2 --gpu-memory-utilization 0.8
# After running inference, the model will generate results in protobuf format (.pb files) for each input prompt.
5 转换推理结果
执行以下命令将推理结果进行可视化转换
python src/utils/vis_proto.py --input <input_proto_path> [--output <output_dir>] [--video]
6. Gradio Demo
We provide two Gradio Demos for different application scenarios:
Emu3.5-Image Demo —— Interactive interface optimized for Text-to-Image (T2I) and Any-to-Image (X2I) tasks:
ASCEND_RT_VISIBLE_DEVICES=0,1 python gradio_demo_image.py --host 0.0.0.0 --port 7860
Emu3.5-Interleave Demo —— Launch Emu3.5 Interleave Tasks (Visual Guidance and Visual Narrate) Gradio Demo
ASCEND_RT_VISIBLE_DEVICES=0,1 python gradio_demo_interleave.py --host 0.0.0.0 --port 7860
03 量化
使用modelslim量化工具量化
1 modelslim源码
git clone https://gitcode.com/Ascend/msit.git
2 EMU3.5模型接入
(1) 实现模型量化代码
cd msit/msmodelslim/msmodelslim/model
mkdir emu3_5
cd emu3_5
cp -r Emu3.5源码路径/src/tokenizer_emu3_ibq/ ./
cp -r Emu3.5源码路径/src/tokenizer_emu3_ibq/ ./
cp -r Emu3.5源码路径/Emu3.5/src/emu3p5/modeling_emu3.py ./
cp -r Emu3.5源码路径/Emu3.5/src/emu3p5/configuration_emu3.py ./
将 __init__.py和model_adapter.py拷贝到emu3_5目录下
(2) 添加量化配置文件
vim quant.yaml,添加以下内容
apiversion: modelslim_v1
spec:
process:
- type: "iter_smooth"
alpha: 0.9
scale_min: 1e-5
symmetric: True
enable_subgraph_type:
- 'norm-linear'
- 'linear-linear'
- 'ov'
- 'up-down'
include:
- "*"
exclude:
- "*self_attn*"
- type: "quarot"
online: False
block_size: -1
max_tp_size: 4
down_proj_online_layers: [ ]
- type: "linear_quant" # 线性层量化
qconfig:
act: # 激活值量化
scope: "per_token" # 动态量化
dtype: "int8" # 8比特整数量化
symmetric: True # 对称量化
method: "minmax" # 使用minmax算法
weight: # 权重量化
scope: "per_channel" # per_channel量化
dtype: "int8" # 8比特整数量化
symmetric: True # 对称量化
method: "minmax" # 使用minmax算法
include: [ "*" ] # 全局w8a8动态量化
exclude: [ "*down_proj*" ] # 回退down_proj层
save:
- type: "ascendv1_saver"
part_file_size: 4 # 每个safetensors权重文件最大4G`
(3) 注册模型
在配置文件config.ini中注册模型名称
在ModelAdapter中的注册emu3.5模型,en3对应下面的Qwen3ModelAdapter模型适配器
emu3_5 = EMU3.5, Emu3.5-Image
在ModelAdapterEntryPoints中添加模型适配器
emu3_5 = msmodelslim.model.emu3_5.model_adapter:Emu35ModelAdapter
3 安装modelslim
cd msit/msmodelslim
bash install.sh
4 执行量化
# tokenizer_emu3_ibq 路径
export TOKENIZER_PATH= ~/msit/msmodelslim/msmodelslim/model/emu3_5/tokenizer_emu3_ibq
# 一键量化配置文件路径
yaml_config=msit/msmodelslim/msmodelslim/model/emu3_5/quant.yaml
# Emu3.5-Image原始权重路径
model_path=BAAI/Emu3.5-Image/
# 量化后权重保存路径
save_path=BAAI/Emu3.5-Image-w8a8
msmodelslim quant --model_path $model_path \
--save_path $save_path \
--device npu \
--model_type EMU3.5 \
--config_path $yaml_config \
--trust_remote_code False
5 量化后推理
使用vllm推理,环境搭建参见第四节【使用vLLM推理】。
在src/utils/model_utils.py中第127行之后,添加参数quantization="ascend",
魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。
更多推荐



所有评论(0)