I'm trying to read, in near-realtime, the volume coming from the audio of a USB microphone in Python.

I have the pieces, but can't figure out how to put it together.

If I already have a .wav file, I can pretty simply read it using wavefile:

from wavefile import WaveReader

with WaveReader("/Users/rmartin/audio.wav") as r:

for data in r.read_iter(size=512):

left_channel = data[0]

volume = np.linalg.norm(left_channel)

print volume

This works great, but I want to process the audio from the microphone in real-time, not from a file.

So my thought was to use something like ffmpeg to PIPE the real-time output into WaveReader, but my Byte knowledge is somewhat lacking.

import subprocess

import numpy as np

command = ["/usr/local/bin/ffmpeg",

'-f', 'avfoundation',

'-i', ':2',

'-t', '5',

'-ar', '11025',

'-ac', '1',

'-acodec','aac', '-']

pipe = subprocess.Popen(command, stdout=subprocess.PIPE, bufsize=10**8)

stdout_data = pipe.stdout.read()

audio_array = np.fromstring(stdout_data, dtype="int16")

print audio_array

That looks pretty, but it doesn't do much. It fails with a [NULL @ 0x7ff640016600] Unable to find a suitable output format for 'pipe:' error.

I assume this is a fairly simple thing to do given that I only need to check the audio for volume levels.

Anyone know how to accomplish this simply? FFMPEG isn't a requirement, but it does need to work on OSX & Linux.

解决方案

Thanks to @Matthias for the suggestion to use the sounddevice module. It's exactly what I need.

For posterity, here is a working example that prints real-time audio levels to the shell:

# Print out realtime audio volume as ascii bars

import sounddevice as sd

import numpy as np

def print_sound(indata, outdata, frames, time, status):

volume_norm = np.linalg.norm(indata)*10

print ("|" * int(volume_norm))

with sd.Stream(callback=print_sound):

sd.sleep(10000)

Logo

魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。

更多推荐