Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

get playing wav audio level as output

I want to make a speaking mouth which moves or emits light or something when a playing wav file emits sound. So I need to detect when a wav file is speaking or when it is in a silence between words. Currently I'm using a pygame script that I have found

import pygame
pygame.mixer.init()
pygame.mixer.music.load("my_sentence.wav")
pygame.mixer.music.play()
while pygame.mixer.music.get_busy() == True:
    continue

I guess I could make some checking at the while loop to look the sounds output level, or something like that, and then send it to one of the gpio outputs. But I don't know how to achieve that.

Any help would be much appreciated

like image 773
cor Avatar asked Jun 01 '15 06:06

cor


People also ask

Are all wav files PCM?

WAV files doesn't specify any specific audio encoding. However, the most common encoding seems to be a variant of PCM: LPCM. That's not entirely true, but in practice you'll probably find 99.9% of the WAV files you will come across will just contain PCM data.

How do I play a WAV file in Python?

python-sounddevice In order to play WAV files, numpy and soundfile need to be installed, to open WAV files as NumPy arrays. The line containing sf. read() extracts the raw audio data, as well as the sampling rate of the file as stored in its RIFF header, and sounddevice.

How big should a wav be?

Size scales with samplerate and bit depth. A five-minute CD track should be 50.5 MB. For example, a 24-bit 44.1 kHz WAV file will be 150% the size of a CD track of equal length (24-bit is 150% of the information of 16-bit).


1 Answers

You'll need to inspect the WAV file to work out when the voice is present. The simplest way to do this is look for loud and quiet periods. Because sound works with waves, when it's quiet the values in the wave file won't change very much, and when it's loud they'll be changing a lot.

One way of estimating loudness is the variance. As you can see the the article, this can be defined as E[(X - mu)^2], which could be written average((X - average(X))^2). Here, X is the value of the signal at a given point (the values stored in the WAV file, called sample in the code). If it's changing a lot, the variance will be large.

This would let you calculate the loudness of an entire file. However, you want to track how loud the file is at any given time, which means you need a form of moving average. An easy way to get this is with a first-order low-pass filter.

I haven't tested the code below so it's extremely unlikely to work, but it should get you started. It loads the WAV file, uses low-pass filters to track the mean and variance, and works out when the variance goes above and below a certain threshold. Then, while playing the WAV file it keeps track of the time since it started playing, and prints out whether the WAV file is loud or quiet.

Here's what you might still need to do:

  • Fix all my deliberate mistakes in the code
  • Add something useful to react to the loud/quiet changes
  • Change the threshold and reaction_time to get good results with your audio
  • Add some hysteresis (a variable threshold) to stop the light flickering

I hope this helps!

import wave
import struct
import time

def get_loud_times(wav_path, threshold=10000, time_constant=0.1):
    '''Work out which parts of a WAV file are loud.
        - threshold: the variance threshold that is considered loud
        - time_constant: the approximate reaction time in seconds'''

    wav = wave.open(wav_path, 'r')
    length = wav.getnframes()
    samplerate = wav.getframerate()

    assert wav.getnchannels() == 1, 'wav must be mono'
    assert wav.getsampwidth() == 2, 'wav must be 16-bit'

    # Our result will be a list of (time, is_loud) giving the times when
    # when the audio switches from loud to quiet and back.
    is_loud = False
    result = [(0., is_loud)]

    # The following values track the mean and variance of the signal.
    # When the variance is large, the audio is loud.
    mean = 0
    variance = 0

    # If alpha is small, mean and variance change slower but are less noisy.
    alpha = 1 / (time_constant * float(sample_rate))

    for i in range(length):
        sample_time = float(i) / samplerate
        sample = struct.unpack('<h', wav.readframes(1))

        # mean is the average value of sample
        mean = (1-alpha) * mean + alpha * sample

        # variance is the average value of (sample - mean) ** 2
        variance = (1-alpha) * variance + alpha * (sample - mean) ** 2

        # check if we're loud, and record the time if this changes
        new_is_loud = variance > threshold
        if is_loud != new_is_loud:
            result.append((sample_time, new_is_loud))
        is_loud = new_is_loud

    return result

def play_sentence(wav_path):
    loud_times = get_loud_times(wav_path)
    pygame.mixer.music.load(wav_path)

    start_time = time.time()
    pygame.mixer.music.play()

    for (t, is_loud) in loud_times:
        # wait until the time described by this entry
        sleep_time = start_time + t - time.time()
        if sleep_time > 0:
            time.sleep(sleep_time)

        # do whatever
        print 'loud' if is_loud else 'quiet'
like image 53
Rodrigo Queiro Avatar answered Oct 16 '22 07:10

Rodrigo Queiro