Trying to get the frequencies of a .wav file in Python

Tags:

I know that questions about .wav files in Python have been just about beaten to death, but I am extremely frustrated as no one's answer seems to be working for me. What I'm trying to do seems relatively simple to me: I want to know exactly what frequencies there are in a .wav file at given times. I want to know, for example, "from the time n milliseconds to n + 10 milliseconds, the average frequency of the sound was x hertz". I have seen people talking about Fourier transforms and Goertzel algorithms, as well as various modules, that I can't seem to figure out how to get to do what I've described. I've tried looking up such things as "find frequency of a wav file in python" about twenty times to no avail. Can someone please help me?

What I'm looking for is a solution like this pseudocode, or at least one that will do something like what the pseudocode is getting at:

import some_module_that_can_help_me_do_this as freq

file = 'output.wav'
start_time = 1000  # Start 1000 milliseconds into the file
end_time = 1010  # End 10 milliseconds thereafter

print("Average frequency = " + str(freq.average(start_time, end_time)) + " hz")

Please assume (as I'm sure you can tell) that I'm an idiot at math. This is my first question here so be gentle

884

asked Feb 10 '19 00:02

dfalzone - Reinstate Monica

2 Answers

This answer is quite late, but you could try this:

(Note: I deserve very little credit for this since I got most of it from other SO posts and this great article on FFT using Python: https://realpython.com/python-scipy-fft/)

import numpy as np
from scipy.fft import *
from scipy.io import wavfile


def freq(file, start_time, end_time):

    # Open the file and convert to mono
    sr, data = wavfile.read(file)
    if data.ndim > 1:
        data = data[:, 0]
    else:
        pass

    # Return a slice of the data from start_time to end_time
    dataToRead = data[int(start_time * sr / 1000) : int(end_time * sr / 1000) + 1]

    # Fourier Transform
    N = len(dataToRead)
    yf = rfft(dataToRead)
    xf = rfftfreq(N, 1 / sr)

    # Uncomment these to see the frequency spectrum as a plot
    # plt.plot(xf, np.abs(yf))
    # plt.show()

    # Get the most dominant frequency and return it
    idx = np.argmax(np.abs(yf))
    freq = xf[idx]
    return freq

This code can work for any .wav file, but it may be slightly off since it only returns the most dominant frequency, and also because it only uses the first channel of the audio (if not mono).

If you want to learn more about how the Fourier transform works, check out this video by 3blue1brown with a visual explanation: https://www.youtube.com/watch?v=spUNpyF58BY

answered Sep 20 '22 14:09

kidkoder432

If you'd like to detect pitch of a sound (and it seems you do), then in terms of Python libraries your best bet is aubio. Please consult this example for implementation.

import sys
from aubio import source, pitch

win_s = 4096
hop_s = 512 

s = source(your_file, samplerate, hop_s)
samplerate = s.samplerate

tolerance = 0.8

pitch_o = pitch("yin", win_s, hop_s, samplerate)
pitch_o.set_unit("midi")
pitch_o.set_tolerance(tolerance)

pitches = []
confidences = []

total_frames = 0
while True:
    samples, read = s()
    pitch = pitch_o(samples)[0]
    pitches += [pitch]
    confidence = pitch_o.get_confidence()
    confidences += [confidence]
    total_frames += read
    if read < hop_s: break

print("Average frequency = " + str(np.array(pitches).mean()) + " hz")

Be sure to check docs on pitch detection methods.

I also thought you might be interested in estimation of mean frequency and some other audio parameters without using any special libraries. Let's just use numpy! This should give you much better insight into how such audio features can be calculated. It's based off specprop from seewave package. Check docs for meaning of computed features.

import numpy as np

def spectral_properties(y: np.ndarray, fs: int) -> dict:
    spec = np.abs(np.fft.rfft(y))
    freq = np.fft.rfftfreq(len(y), d=1 / fs)
    spec = np.abs(spec)
    amp = spec / spec.sum()
    mean = (freq * amp).sum()
    sd = np.sqrt(np.sum(amp * ((freq - mean) ** 2)))
    amp_cumsum = np.cumsum(amp)
    median = freq[len(amp_cumsum[amp_cumsum <= 0.5]) + 1]
    mode = freq[amp.argmax()]
    Q25 = freq[len(amp_cumsum[amp_cumsum <= 0.25]) + 1]
    Q75 = freq[len(amp_cumsum[amp_cumsum <= 0.75]) + 1]
    IQR = Q75 - Q25
    z = amp - amp.mean()
    w = amp.std()
    skew = ((z ** 3).sum() / (len(spec) - 1)) / w ** 3
    kurt = ((z ** 4).sum() / (len(spec) - 1)) / w ** 4

    result_d = {
        'mean': mean,
        'sd': sd,
        'median': median,
        'mode': mode,
        'Q25': Q25,
        'Q75': Q75,
        'IQR': IQR,
        'skew': skew,
        'kurt': kurt
    }

    return result_d

answered Sep 18 '22 14:09

Lukasz Tracewski

Related questions
                            
                                How to do Onehotencoding in Sklearn Pipeline
                            
                                How to create a rpm for python application
                            
                                Celery Task Priority
                            
                                Importing from python modules inside parent directory into jupyter notebook files inside subdirectory
                            
                                setup.py -- configuration for private / commercial projects
                            
                                Using Marshmallow without repeating myself
                            
                                Why datetime.now() and datetime.today() show time in UTC and not local time on my PC?
                            
                                Interpreting tensorboard plots
                            
                                python & json.dump: how to make inner array in one line [duplicate]
                            
                                Dask dataframe split partitions based on a column or function
                            
                                How to visualize a TFRecord?
                            
                                Is there a tensorflow equivalent to np.empty?
                            
                                pyCUDA with Flask gives pycuda._driver.LogicError: cuModuleLoadDataEx
                            
                                TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'q')
                            
                                Dynamically add new WTForms FieldList entries from user interface
                            
                                Is it a good practice to put common methods to an abstract class in Python?
                            
                                Xgboost throws an error when trying to import
                            
                                Find all subarrays of fixed length with a given ranking
                            
                                How to build a Plinko board of words from a dictionary better than brute force?
                            
                                Weird result of floor division in numpy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Trying to get the frequencies of a .wav file in Python

Tags:

python

python-3.x

audio

wav

dfalzone - Reinstate Monica

People also ask

2 Answers

kidkoder432

Lukasz Tracewski

Recent Activity

Donate For Us