I have a large time series, say 1e10, that results from recording neural activity, i.e. voltages. Before doing further analysis I want to band pass filter that data between 300 Hz and 7000 Hz. Below, I post the code for the Butterworth filter I designed. How do I make this filter faster? It takes too long to run.
Update: Sample Data
Here is a link to data such as would be in each row of data
.
As to formatting, each row represents a different recording source and each column represents a point in time. The data are sampled at 20, 000 Hz.
def butter_bandpass(lowcut,highcut,fs,order=8):
nyq = 0.5*fs
low = lowcut/nyq
high = highcut/nyq
b,a = butter(order, [low, high], btype='band')
return b,a
def butter_bandpass_filter(data,lowcut,highcut,fs,order=8):
b,a = butter_bandpass(lowcut,highcut,fs,order=order)
return lfilter(b,a,data)
print 'Band-pass filtering data'
data = map(lambda channel: butter_bandpass_filter(channel,300,7000,20000),data)
Update
Like most NumPy, SciPy functions lfilter
can take a multidimensional input and so map
creates unnecessary overhead. That is, one can rewrite
data = map(lambda channel:butter_bandpass_filter(channel,300,7000,20000),data)
as
data = butter_bandpass_filter(data,300,7000,20000)
By default lfilter
operates on the last non-singleton axis. For a 2D matrix, this means the function is applied to each row, which is exactly what I need. Sadly, it's still too slow though.
First, your data sample is in a proprietary format, am I right? Even using the biosig toolbox for Python this format cannot be read. Maybe I'm wrong, but I didn't succeed to read it.
Thus, I'll base my answer on artificial data, generated from a Rössler-oscillator. It is a chaotic, 3d-oscillator, often used in the field of nonlinear timeseries analysis.
import numpy as np
from scipy.signal import butter, lfilter
##############################################################
# For generating sample-data
##############################################################
from scipy.integrate import odeint
def roessler_ode(y,t,omega=1,a=0.165,b=0.2,c=10):
dy = np.zeros((3))
dy[0] = -1.0*(omega*y[1] + y[2]) #+ e1*(y[3]-y[0])
dy[1] = omega * y[0] + a * y[1]
dy[2] = b + y[2] * (y[0] - c)
return dy
class Roessler(object):
"""A single coupled Roessler oscillators"""
def __init__(self, y=None, omega=1.0, a=0.165,b=0.2,c=10):
self.omega = omega
self.a = a
self.b = b
self.c = c
if y==None:
self.y = np.random.random((3))+0.5
else:
self.y = y
def ode(self,y,t):
dy = roessler_ode(y[:],t,self.omega,self.a,self.b,self.c)
return dy
def integrate(self,ts):
rv = odeint(self.ode,self.y,ts)
self.y = rv[-1,:]
return rv
###############################################################
Now come your function definitions:
def butter_bandpass(lowcut,highcut,fs,order=8):
nyq = 0.5*fs
low = lowcut/nyq
high = highcut/nyq
b,a = butter(order, [low, high], btype='band')
return b,a
def butter_bandpass_filter(data,lowcut,highcut,fs,order=8):
b,a = butter_bandpass(lowcut,highcut,fs,order=order)
return lfilter(b,a,data)
I left them unchanged. I generate some data with my oscillator, but I only take the 3rd component of it. I add some gaussian noise in order to have something to filter out.
# generate sample data
data = Roessler().integrate(np.arange(0,1000,0.1))[:,2]
data += np.random.normal(size=data.shape)
Now let's come to the speed question. I use the timeit
-module to check execution times. These statements execute the filtering 100 times, and measure the overall time. I do this measurement for order 2 and order 8 (yes, you want sharper spectral edges, I know, but wait)
# time execution
from timeit import timeit
time_order8 = timeit("butter_bandpass_filter(data,300,2000,20000,8)", "from __main__ import butter_bandpass_filter, butter_bandpass, data", number=100)
time_order2 = timeit("butter_bandpass_filter(data,300,2000,20000,2)", "from __main__ import butter_bandpass_filter, butter_bandpass, data", number=100)
print "For order 8: %.2f seconds" % time_order8
print "For order 2: %.2f seconds" % time_order2
The output of these two print statements is:
For order 8: 11.70 seconds
For order 2: 0.54 seconds
This makes a factor of 20! Using order 8 for a butterworth filter is definitely not a good idea. I cannot think of any situation where this would make sense. To prove the other problems that arise when using such a filter, let's look at the effect of those filters. We apply bandpass filtering to our data, once with order 8 and once with order 2:
data_bp8 = butter_bandpass_filter(data,300,2000,20000,8)
data_bp2 = butter_bandpass_filter(data,300,2000,20000,2)
Now let's do some plots. First, simple lines (I didn't care about the x-axis)
# plot signals
import matplotlib.pyplot as plt
plt.figure(1)
plt.plot(data, label="raw")
plt.plot(data_bp8, label="order 8")
plt.plot(data_bp2, label="order 2")
plt.legend()
This gives us:
Oh, what happened to the green line? Weird, isn't it? The reason is that butterworth filters of order 8 become a rather instable thing. Ever heard of resonance disaster / catastrophe? This is what it looks like.
The power spectral densities of these signals can be plotted like:
# plot power spectral densities
plt.figure(2)
plt.psd(data, Fs=200000, label="raw")
plt.psd(data_bp8, Fs=20000, label="order 8")
plt.psd(data_bp2, Fs=20000, label="order 2")
plt.legend()
plt.show()
Here, you see that the green line has sharper edges, but at what price? Artificial peak at approx. 300 Hz. The signal is completely distorted.
So what shall you do?
Another hint: if you care about the phase of your signal, you should definitely filter forwards and backwards in time. lfilter
does shift the phases otherwise. An implementation of such an algorithm, commonly refered to as filtfilt
, can be found at my github repository.
One more programming hint:
If you have the situation of pass-through parameters (four parameters of butter_bandpass_filter
are only passed through to butter_bandpass
, you could make use of *args
and **kwargs
.
def butter_bandpass(lowcut,highcut,fs,order=8):
nyq = 0.5*fs
low = lowcut/nyq
high = highcut/nyq
b,a = butter(order, [low, high], btype='band')
return b,a
def butter_bandpass_filter(data, *args, **kwargs):
b,a = butter_bandpass(*args, **kwargs)
return lfilter(b,a,data)
This reduces code redundancy and will make your code less error-prone.
Finally, here is the complete script, for easy copy-pasting to try it out.
import numpy as np
from scipy.signal import butter, lfilter
##############################################################
# For generating sample-data
##############################################################
from scipy.integrate import odeint
def roessler_ode(y,t,omega=1,a=0.165,b=0.2,c=10):
dy = np.zeros((3))
dy[0] = -1.0*(omega*y[1] + y[2]) #+ e1*(y[3]-y[0])
dy[1] = omega * y[0] + a * y[1]
dy[2] = b + y[2] * (y[0] - c)
return dy
class Roessler(object):
"""A single coupled Roessler oscillators"""
def __init__(self, y=None, omega=1.0, a=0.165,b=0.2,c=10):
self.omega = omega
self.a = a
self.b = b
self.c = c
if y==None:
self.y = np.random.random((3))+0.5
else:
self.y = y
def ode(self,y,t):
dy = roessler_ode(y[:],t,self.omega,self.a,self.b,self.c)
return dy
def integrate(self,ts):
rv = odeint(self.ode,self.y,ts)
self.y = rv[-1,:]
return rv
###############################################################
def butter_bandpass(lowcut,highcut,fs,order=8):
nyq = 0.5*fs
low = lowcut/nyq
high = highcut/nyq
b,a = butter(order, [low, high], btype='band')
return b,a
def butter_bandpass_filter(data,lowcut,highcut,fs,order=8):
b,a = butter_bandpass(lowcut,highcut,fs,order=order)
return lfilter(b,a,data)
# generate sample data
data = Roessler().integrate(np.arange(0,1000,0.1))[:,2]
data += np.random.normal(size=data.shape)
# time execution
from timeit import timeit
time_order8 = timeit("butter_bandpass_filter(data,300,2000,20000,8)", "from __main__ import butter_bandpass_filter, butter_bandpass, data", number=100)
time_order2 = timeit("butter_bandpass_filter(data,300,2000,20000,2)", "from __main__ import butter_bandpass_filter, butter_bandpass, data", number=100)
print "For order 8: %.2f seconds" % time_order8
print "For order 2: %.2f seconds" % time_order2
data_bp8 = butter_bandpass_filter(data,300,2000,20000,8)
data_bp2 = butter_bandpass_filter(data,300,2000,20000,2)
# plot signals
import matplotlib.pyplot as plt
plt.figure(1)
plt.plot(data, label="raw")
plt.plot(data_bp8, label="order 8")
plt.plot(data_bp2, label="order 2")
plt.legend()
# plot power spectral densities
plt.figure(2)
plt.psd(data, Fs=200000, label="raw")
plt.psd(data_bp8, Fs=20000, label="order 8")
plt.psd(data_bp2, Fs=20000, label="order 2")
plt.legend()
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With