I've trained up a convolutional neural network which I would like others to be able to use without requiring hard to install libraries such as Theano (which I have found trivial to install on Linux, but very hard on Windows).
I've written an implementation using Numpy/Scipy that is almost fast enough, but would be even better if it was two or three times faster.
90% of the time is spent in the following line:
conv_out = np.sum([scipy.signal.convolve2d(x[i],W[f][i],mode='valid') for i in range(num_in)], axis=0)
This line gets called 32 times (once for each feature map), and num_in is 16 (the number of features in the previous layer). So overall this line is slow as it results in 32*16=512 calls to the convolve2d routine.
x[i] is only 25*25, and W[f][i] is 2*2.
Is there a better way of expressing this type of convolutional layer in Numpy/Scipy that would execute faster?
(I am only using this code for applying a learnt network so I do not have lots of images to be done in parallel.)
The full code for doing a timing experiment is:
import numpy as np
import scipy.signal
from time import time
def max_pool(x):
"""Return maximum in groups of 2x2 for a N,h,w image"""
N,h,w = x.shape
return np.amax([x[:,(i>>1)&1::2,i&1::2] for i in range(4)],axis=0)
def conv_layer(params,x):
"""Applies a convolutional layer (W,b) followed by 2*2 pool followed by RelU on x"""
W,biases = params
num_in = W.shape[1]
A = []
for f,bias in enumerate(biases):
conv_out = np.sum([scipy.signal.convolve2d(x[i],W[f][i],mode='valid') for i in range(num_in)], axis=0)
A.append(conv_out + bias)
x = np.array(A)
x = max_pool(x)
return np.maximum(x,0)
W = np.random.randn(32,16,2,2).astype(np.float32)
b = np.random.randn(32).astype(np.float32)
I = np.random.randn(16,25,25).astype(np.float32)
t0 = time()
O = conv_layer((W,b),I)
print time()-t0
This prints 0.084 seconds at the moment.
Using mplf's suggestion:
d = x[:,:-1,:-1]
c = x[:,:-1,1:]
b = x[:,1:,:-1]
a = x[:,1:,1:]
for f,bias in enumerate(biases):
conv_out = np.sum([a[i]*W[f,i,0,0]+b[i]*W[f,i,0,1]+c[i]*W[f,i,1,0]+d[i]*W[f,i,1,1] for i in range(num_in)], axis=0)
I get 0.075s, which is slightly faster.
Building on mplf's suggestion I've found it is possible to remove both of the for loops and the call to convolve2d:
d = x[:,:-1,:-1].swapaxes(0,1)
c = x[:,:-1,1:].swapaxes(0,1)
b = x[:,1:,:-1].swapaxes(0,1)
a = x[:,1:,1:].swapaxes(0,1)
x = W[:,:,0,0].dot(a) + W[:,:,0,1].dot(b) + W[:,:,1,0].dot(c) + W[:,:,1,1].dot(d) + biases.reshape(-1,1,1)
This is 10 times faster than the original code.
With this new code, the max pool stage now takes 50% of the time. This can also be sped up by using:
def max_pool(x):
"""Return maximum in groups of 2x2 for a N,h,w image"""
N,h,w = x.shape
x = x.reshape(N,h/2,2,w/2,2).swapaxes(2,3).reshape(N,h/2,w/2,4)
return np.amax(x,axis=3)
This speeds up the max_pool step by a factor of 10, so overall the program doubles in speed again.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With