I want to make a speaking mouth which moves or emits light or something when a playing wav file emits sound. So I need to detect when a wav file is speaking or when it is in a silence between words. Currently I'm using a pygame script that I have found
import pygame
pygame.mixer.init()
pygame.mixer.music.load("my_sentence.wav")
pygame.mixer.music.play()
while pygame.mixer.music.get_busy() == True:
continue
I guess I could make some checking at the while loop to look the sounds output level, or something like that, and then send it to one of the gpio outputs. But I don't know how to achieve that.
Any help would be much appreciated
WAV files doesn't specify any specific audio encoding. However, the most common encoding seems to be a variant of PCM: LPCM. That's not entirely true, but in practice you'll probably find 99.9% of the WAV files you will come across will just contain PCM data.
python-sounddevice In order to play WAV files, numpy and soundfile need to be installed, to open WAV files as NumPy arrays. The line containing sf. read() extracts the raw audio data, as well as the sampling rate of the file as stored in its RIFF header, and sounddevice.
Size scales with samplerate and bit depth. A five-minute CD track should be 50.5 MB. For example, a 24-bit 44.1 kHz WAV file will be 150% the size of a CD track of equal length (24-bit is 150% of the information of 16-bit).
You'll need to inspect the WAV file to work out when the voice is present. The simplest way to do this is look for loud and quiet periods. Because sound works with waves, when it's quiet the values in the wave file won't change very much, and when it's loud they'll be changing a lot.
One way of estimating loudness is the variance. As you can see the the article, this can be defined as E[(X - mu)^2]
, which could be written average((X - average(X))^2)
. Here, X is the value of the signal at a given point (the values stored in the WAV file, called sample
in the code). If it's changing a lot, the variance will be large.
This would let you calculate the loudness of an entire file. However, you want to track how loud the file is at any given time, which means you need a form of moving average. An easy way to get this is with a first-order low-pass filter.
I haven't tested the code below so it's extremely unlikely to work, but it should get you started. It loads the WAV file, uses low-pass filters to track the mean and variance, and works out when the variance goes above and below a certain threshold. Then, while playing the WAV file it keeps track of the time since it started playing, and prints out whether the WAV file is loud or quiet.
Here's what you might still need to do:
I hope this helps!
import wave
import struct
import time
def get_loud_times(wav_path, threshold=10000, time_constant=0.1):
'''Work out which parts of a WAV file are loud.
- threshold: the variance threshold that is considered loud
- time_constant: the approximate reaction time in seconds'''
wav = wave.open(wav_path, 'r')
length = wav.getnframes()
samplerate = wav.getframerate()
assert wav.getnchannels() == 1, 'wav must be mono'
assert wav.getsampwidth() == 2, 'wav must be 16-bit'
# Our result will be a list of (time, is_loud) giving the times when
# when the audio switches from loud to quiet and back.
is_loud = False
result = [(0., is_loud)]
# The following values track the mean and variance of the signal.
# When the variance is large, the audio is loud.
mean = 0
variance = 0
# If alpha is small, mean and variance change slower but are less noisy.
alpha = 1 / (time_constant * float(sample_rate))
for i in range(length):
sample_time = float(i) / samplerate
sample = struct.unpack('<h', wav.readframes(1))
# mean is the average value of sample
mean = (1-alpha) * mean + alpha * sample
# variance is the average value of (sample - mean) ** 2
variance = (1-alpha) * variance + alpha * (sample - mean) ** 2
# check if we're loud, and record the time if this changes
new_is_loud = variance > threshold
if is_loud != new_is_loud:
result.append((sample_time, new_is_loud))
is_loud = new_is_loud
return result
def play_sentence(wav_path):
loud_times = get_loud_times(wav_path)
pygame.mixer.music.load(wav_path)
start_time = time.time()
pygame.mixer.music.play()
for (t, is_loud) in loud_times:
# wait until the time described by this entry
sleep_time = start_time + t - time.time()
if sleep_time > 0:
time.sleep(sleep_time)
# do whatever
print 'loud' if is_loud else 'quiet'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With