Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scipy.signal.resample behaves strangely

I am currently working on some signal processing (using scipy), but I encountered a strange problem and can't figure out what's wrong. Namely, I am reading some audio data from a .wav file, but have to resample before further processing. The signal has more than 500,000 samples.

Now, scipy.signal.resample takes more than 10 minutes on just one of the channels. OK, I thought, this might be normal because there are a lot of samples. However, then I decided to experiment with two other "signals" (i.e. a randomly generated array of numbers and an array of zeros) with 1,000,000 samples and resample these ones. Strangely, resampling in this case takes only a few milliseconds, so the size is obviously not a problem.

My final experiment was extracting the zeros from my original signal (there are about 50,000 samples that are zero-valued) and resampling them. I was totally surprised to see that resampling only 50,000 zeros takes about a minute. Previously, I resampled an array of zeros that had 1,000,000 samples in a few milliseconds and now I have to wait about a minute for an array of 50,000 samples. Something has to be wrong, but I can't figure out what.

I really don't see any reason for this behavior; especially the zeros (1,000,000 and just a few milliseconds vs 50,000 and a minute) surprise me a lot.

Here's a sample code, so that you know what I'm talking about:

import scipy.io.wavfile as wavfile
import numpy
import scipy.signal as signal

sample_rate, signal_data = wavfile.read('file.wav')

test_channel = numpy.array(signal_data[:,0], dtype=float)
channel_zeros = numpy.array(signal_data[numpy.where(signal_data[:,0]==0)[0],0], dtype=float)
test_signal = numpy.random.rand((1000000))
test_signal_2 = numpy.zeros((1000000))

number_of_samples = 500

#both of these are executed in less than a second
resampled_random = signal.resample(test_signal, number_of_samples)
resampled_zeros = signal.resample(test_signal_2, number_of_samples)

#this takes minutes
resamples_original_signal = signal.resample(test_channel, number_of_samples)

#this takes about a minute
resampled_original_zeros = signal.resample(channel_zeros, number_of_samples)

Do you have any idea what might be wrong with this? Thanks in advance.

like image 348
Alex Mitrevski Avatar asked Nov 17 '13 23:11

Alex Mitrevski


1 Answers

The numpy implementation of FFT (based on FFTPACK) is fastest when the length of the data is a power of 2 (e.g. 2, 4, 8, 16, 32) and slowest when it is a prime. To speed up processing of the signal, you can zero-pad the data to a power of 2 length.

In Python you can use the following code to find the next largest power of 2 for a given number:

y = np.floor(np.log2(n))
nextpow2 = np.power(2, y+1)

You can use this with numpy.pad to pad your data array to this size:

sample_rate, signal_data = wavfile.read('file.wav')
n = signal_data.shape[0]

y = np.floor(np.log2(n))
nextpow2  = np.power(2, y+1)

signal_data  = np.pad(signal_data , ((0, nextpow2-n), (0,0)), mode='constant')

For more background on scipy/numpy and FFT in general in see this question.

like image 81
mfitzp Avatar answered Oct 18 '22 00:10

mfitzp