My code call numerous "difference functions" to compute the "Yin algorithm" (fundamental frequency extractor).
The difference function (eq. 6 in the paper) is defined as:
And this is my implementation of the difference function:
def differenceFunction(x, W, tau_max):
df = [0] * tau_max
for tau in range(1, tau_max):
for j in range(0, W - tau):
tmp = long(x[j] - x[j + tau])
df[tau] += tmp * tmp
return df
For instance with:
x = np.random.randint(0, high=32000, size=2048, dtype='int16')
W = 2048
tau_max = 106
differenceFunction(x, W, tau_max)
Is there a way to optimize this double-loop computation (with python only, preferably without other libraries than numpy)?
EDIT: Changed code to avoid Index Error (j loop, @Elliot answer)
EDIT2: Changed code to use x[0] (j loop, @hynekcer comment)
EDIT: Improved speed to 220 µs - see edit at the end - direct version
The required calculation can be easily evaluated by Autocorrelation function or similarly by convolution. Wiener–Khinchin theorem allows computing the autocorrelation with two Fast Fourier transforms (FFT), with time complexity O(n log n). I use accellerated convolution function fftconvolve from Scipy package. An advantage is that it is easy to explain here why it works. Everything is vectorized, no loop at Python interpreter level.
from scipy.signal import fftconvolve
def difference_by_convol(x, W, tau_max):
x = np.array(x, np.float64)
w = x.size
x_cumsum = np.concatenate((np.array([0.]), (x * x).cumsum()))
conv = fftconvolve(x, x[::-1])
df = x_cumsum[w:0:-1] + x_cumsum[w] - x_cumsum[:w] - 2 * conv[w - 1:]
return df[:tau_max + 1]
differenceFunction_1loop
function in Elliot's answer: It is faster with FFT: 430 µs compared to the original 1170 µs. It starts be faster for about tau_max >= 40
. The numerical accuracy is great. The highest relative error is less then 1E-14 compared to exact integer result. (Therefore it could be easily rounded to the exact long integer solution.)tau_max
is not important for the algorithn. It only restricts the output finally. A zero element at index 0 is added to the output because indexes should start by 0 in Python.W
is not important in Python. The size is better to be introspected.correlate(x, x) = convolve(x, reversed(x)
.convolve
automatically chooses this method or the direct method based on an estimation of which is faster." That heuristics is not adequate to this case because the convolution evaluates much more tau
than tau_max
and it must be outweighed by much faster FFT than a direct method.Proof: (for Pythonistas :-)
The original naive implementation can be written as:
df = [sum((x[j] - x[j + t]) ** 2 for j in range(w - t)) for t in range(tau_max + 1)]
where tau_max < w
.
Derive by rule (a - b)**2 == a**2 + b**2 - 2 * a * b
df = [ sum(x[j] ** 2 for j in range(w - t))
+ sum(x[j] ** 2 for j in range(t, w))
- 2 * sum(x[j] * x[j + t] for j in range(w - t))
for t in range(tau_max + 1)]
Substitute the first two elements with help of x_cumsum = [sum(x[j] ** 2 for j in range(i)) for i in range(w + 1)]
that can be easily calculated in linear time. Substitute sum(x[j] * x[j + t] for j in range(w - t))
by convolution conv = convolvefft(x, reversed(x), mode='full')
that has output of size len(x) + len(x) - 1
.
df = [x_cumsum[w - t] + x_cumsum[w] - x_cumsum[t]
- 2 * convolve(x, x[::-1])[w - 1 + t]
for t in range(tau_max + 1)]
Optimize by vector expressions:
df = x_cumsum[w:0:-1] + x_cumsum[w] - x_cumsum[:w] - 2 * conv[w - 1:]
Every step can be also tested and compared by test data numerically
EDIT: Implemented solution directly by Numpy FFT.
def difference_fft(x, W, tau_max):
x = np.array(x, np.float64)
w = x.size
tau_max = min(tau_max, w)
x_cumsum = np.concatenate((np.array([0.]), (x * x).cumsum()))
size = w + tau_max
p2 = (size // 32).bit_length()
nice_numbers = (16, 18, 20, 24, 25, 27, 30, 32)
size_pad = min(x * 2 ** p2 for x in nice_numbers if x * 2 ** p2 >= size)
fc = np.fft.rfft(x, size_pad)
conv = np.fft.irfft(fc * fc.conjugate())[:tau_max]
return x_cumsum[w:w - tau_max:-1] + x_cumsum[w] - x_cumsum[:tau_max] - 2 * conv
It is more than twice faster than my previous solution because the length of convolution is restricted to a nearest "nice" number with small prime factors after W + tau_max
, not evaluated full 2 * W
. It is also not necessary to transform the same data twice as it was with `fftconvolve(x, reversed(x)).
In [211]: %timeit differenceFunction_1loop(x, W, tau_max)
1.1 ms ± 4.51 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [212]: %timeit difference_by_convol(x, W, tau_max)
431 µs ± 5.69 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [213]: %timeit difference_fft(x, W, tau_max)
218 µs ± 685 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
The newest solution is faster than Eliot's difference_by_convol for tau_max >= 20. That ratio doesn't depend much on data size because of similar ratio of overhead costs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With