I'd like to improve the performance of convolution using python, and was hoping for some insight on how to best go about improving performance.
I am currently using scipy to perform the convolution, using code somewhat like the snippet below:
import numpy
import scipy
import scipy.signal
import timeit
a=numpy.array ( [ range(1000000) ] )
a.reshape(1000,1000)
filt=numpy.array( [ [ 1, 1, 1 ], [1, -8, 1], [1,1,1] ] )
def convolve():
global a, filt
scipy.signal.convolve2d ( a, filt, mode="same" )
t=timeit.Timer("convolve()", "from __main__ import convolve")
print "%.2f sec/pass" % (10 * t.timeit(number=10)/100)
I am processing image data, using grayscale (integer values between 0 and 255), and I currently get about a quarter of a second per convolution. My thinking was to do one of the following:
Use corepy, preferably with some optimizations Recompile numpy with icc & ikml. Use python-cuda.
I was wondering if anyone had any experience with any of these approaches ( what sort of gain would be typical, and if it is worth the time ), or if anyone is aware of a better library to perform convolution with Numpy.
Thanks!
EDIT:
Speed up of about 10x by re-writing python loop in C over using Numpy.
By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.
Using CuPy is a great way to accelerate Numpy and matrix operations on the GPU by many times. It's important to note that the speedups you'll get are highly dependant on the size of the array you're working with.
Numba is claimed to be the fastest, around 10 times faster than numpy. Julia is claimed by its developers to be very fast language.
Takeaway: NumPy provides highly-optimized functions for performing mathematical operations on arrays of numbers.
The code in scipy for doing 2d convolutions is a bit messy and unoptimized. See http://svn.scipy.org/svn/scipy/trunk/scipy/signal/firfilter.c if you want a glimpse into the low-level functioning of scipy.
If all you want is to process with a small, constant kernel like the one you showed, a function like this might work:
def specialconvolve(a):
# sorry, you must pad the input yourself
rowconvol = a[1:-1,:] + a[:-2,:] + a[2:,:]
colconvol = rowconvol[:,1:-1] + rowconvol[:,:-2] + rowconvol[:,2:] - 9*a[1:-1,1:-1]
return colconvol
This function takes advantage of the separability of the kernel like DarenW suggested above, as well as taking advantage of the more optimized numpy arithmetic routines. It's over 1000 times faster than the convolve2d function by my measurements.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With