whisper相关项目的安装与使用

whisper是一种通用的语音识别模型。它在不同音频的大型数据集上训练。它也是一个多任务模型，可以执行多语种语音识别、语音翻译语音语种识别。openai-whisper的安装与使用faster-whisper的安装与使用并对比了两个项目

莽夫搞战术

1083人浏览 · 2024-12-25 17:08:39

莽夫搞战术 · 2024-12-25 17:08:39 发布

whisper相关项目的安装与使用

1. openai/whipser安装与使用
- 1.1. openai/whipser安装
- 1.2. openai/whisper的使用
2. faster-whisper安装与使用
- 2.1. faster-whisper安装
- 2.2. faster-whisper使用
3. openai/whisper与faster-whisper对比

Whisper是一种通用的语音识别模型。它在不同音频的大型数据集上训练。它也是一个多任务模型，可以执行多语种语音识别、语音翻译语音语种识别。

1. openai/whipser安装与使用

1.1. openai/whipser安装

conda create -n whisper python=3.10
source activate whisper

#安装whisper
pip3 install -U openai-whisper -i https://pypi.tuna.tsinghua.edu.cn/simple
#修改PyTorch为对应cuda可支持的版本
pip3 uninstall torch
pip3 install torch==2.5.1 --extra-index-url https://download.pytorch.org/whl/cu121 -i https://pypi.tuna.tsinghua.edu.cn/simple

# 需要系统支持ffmpeg，非root安装
# 需要先安装依赖yasm，安装后添加环境量PATH
wget http://www.tortall.net/projects/yasm/releases/yasm-1.3.0.tar.gz
tar -zxvf yasm-1.3.0.tar.gz
cd yasm-1.3.0/
./configure --enable-shared --prefix=/media/yangdi/yasm-1.3.0
make -j20
make install
#安装ffmpeg，安装后添加环境量PATH和LD_LIBRARY_PATH
wget https://johnvansickle.com/ffmpeg/release-source/ffmpeg-4.1.tar.xz
tar -xvf ffmpeg-4.1.tar.xz
cd ffmpeg-4.1/
./configure --enable-shared --prefix=/media/yangdi/ffmpeg-4.1
make -j20
make install

1.2. openai/whisper的使用

import whisper
import time

model_start_time = time.time()
# device选择具体的卡，默认0
model = whisper.load_model("turbo",  device="cuda:0")
model_end_time = time.time()

print("load model {:.4f}s".format(model_end_time - model_start_time))

asr_start_time = time.time()
result = model.transcribe("test.wav")
asr_end_time = time.time()
print(result["text"])
print("use time {:.4f}".format(asr_end_time - asr_start_time))

2. faster-whisper安装与使用

2.1. faster-whisper安装

faster-whisper要求cuda12以上和CuDNN9以上，我们安装cuda12.1和cuDNN9.1。
cuDNN下载地址：https://developer.nvidia.com/cudnn-archive

#安装cuDNN9.1
wget https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-9.1.1.17_cuda12-archive.tar.xz


conda create -n faster-whisper python=3.10
source activate faster-whisper

#安装faster-whisper
pip3 install faster-whisper -i https://pypi.tuna.tsinghua.edu.cn/simple

2.2. faster-whisper使用

需要在huggingface上下载turbo模型，然后使用local_files_only=True读取本地路径文件

from faster_whisper import WhisperModel
import time

model_start_time = time.time()
# Run on GPU with FP16
model = WhisperModel("faster-whisper-large-v3-turbo-ct2", device="cuda", device_index=0, compute_type="float16", local_files_only=True)
model_end_time = time.time()
print("load model: {:.4f}".format(model_end_time - model_start_time))

text_list = []
asr_start_time = time.time()
segments, info = model.transcribe("data-en/Anne Hathaway Forgets The Princess Diaries and The Devil Wears Prada Details.wav", beam_size=5)
segments = list(segments)
asr_end_time = time.time()
for segment in segments:
    #print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
    text_list.append(segment.text)

text = " ".join(text_list)
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
print(text)
print("use time {:.4f}".format(asr_end_time - asr_start_time))

3. openai/whisper与faster-whisper对比

使用混合turbo模型，在4090D上的测试时间
faster-whisper使用FP16

对比项	whisper	faster-whisper	提升情况
模型初始化时间	8.4s	1.7s	80%
显存占用	6G	2.5G	58%
处理耗时	21.5s	23.2s	-8%
WER	0.1782	0.1795	-0.8%

魔乐社区

魔乐社区（Modelers.cn) 是一个中立、公益的人工智能社区，提供人工智能工具、模型、数据的托管、展示与应用协同服务，为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作，由全产业链共同建设、共同运营、共同享有，推动国产AI生态繁荣发展。

更多推荐

全家桶集齐！Qwen3.5四款小模型上线魔乐社区，附昇腾全套实践教程

魔乐社区

Pont - 搭建前后端之桥：高效、灵活的接口管理工具

Pont 是一款强大的数据服务层解决方案，它能够帮助开发者快速搭建前后端之间的桥梁，实现接口的高效管理和代码自动生成。无论是新手还是有经验的开发者，都能通过 Pont 轻松处理接口文档、生成类型安全的 API 代码，从而显著提升开发效率。[![Pont 工具标志](https://raw.gitcode.com/gh_mirrors/po/pont/raw/3f1b7d4bbba3fd2dda

魔乐社区

如何快速上手 hvac：HashiCorp Vault Python 客户端零基础入门指南

**hvac** 是 HashiCorp Vault 的 Python 3.X 客户端库，专为开发者提供简单高效的 Vault 交互方式。无论你是需要管理密钥、配置身份验证，还是实现安全的秘密数据存储，hvac 都能帮助你轻松搞定 Vault 的各项操作。本文将带你零基础快速入门，从安装到基础操作，让你在几分钟内即可上手使用这个强大的工具。[![hvac 客户端 Logo](https://r