Increasing speed of a pure Numpy/Scipy convolutional neural network implementation

Question

Background

I've trained up a convolutional neural network which I would like others to be able to use without requiring hard to install libraries such as Theano (which I have found trivial to install on Linux, but very hard on Windows).

I've written an implementation using Numpy/Scipy that is almost fast enough, but would be even better if it was two or three times faster.

What I've tried

90% of the time is spent in the following line:

conv_out = np.sum([scipy.signal.convolve2d(x[i],W[f][i],mode='valid') for i in range(num_in)], axis=0)

This line gets called 32 times (once for each feature map), and num_in is 16 (the number of features in the previous layer). So overall this line is slow as it results in 32*16=512 calls to the convolve2d routine.

x[i] is only 25*25, and W[f][i] is 2*2.

Question

Is there a better way of expressing this type of convolutional layer in Numpy/Scipy that would execute faster?

(I am only using this code for applying a learnt network so I do not have lots of images to be done in parallel.)

Code

The full code for doing a timing experiment is:

import numpy as np
import scipy.signal
from time import time

def max_pool(x):
    """Return maximum in groups of 2x2 for a N,h,w image"""
    N,h,w = x.shape
    return np.amax([x[:,(i>>1)&1::2,i&1::2] for i in range(4)],axis=0)

def conv_layer(params,x):
    """Applies a convolutional layer (W,b) followed by 2*2 pool followed by RelU on x"""
    W,biases = params
    num_in = W.shape[1]
    A = []
    for f,bias in enumerate(biases):
        conv_out = np.sum([scipy.signal.convolve2d(x[i],W[f][i],mode='valid') for i in range(num_in)], axis=0)
        A.append(conv_out + bias)
    x = np.array(A)
    x = max_pool(x)
    return np.maximum(x,0)

W = np.random.randn(32,16,2,2).astype(np.float32)
b = np.random.randn(32).astype(np.float32)
I = np.random.randn(16,25,25).astype(np.float32)

t0 = time()
O = conv_layer((W,b),I)
print time()-t0

This prints 0.084 seconds at the moment.

Update

Using mplf's suggestion:

d = x[:,:-1,:-1]
c = x[:,:-1,1:]
b = x[:,1:,:-1]
a = x[:,1:,1:]
for f,bias in enumerate(biases):
    conv_out = np.sum([a[i]*W[f,i,0,0]+b[i]*W[f,i,0,1]+c[i]*W[f,i,1,0]+d[i]*W[f,i,1,1] for i in range(num_in)], axis=0)

I get 0.075s, which is slightly faster.

Peter de Rivaz · Accepted Answer

Accelerating convolution

Building on mplf's suggestion I've found it is possible to remove both of the for loops and the call to convolve2d:

d = x[:,:-1,:-1].swapaxes(0,1)
c = x[:,:-1,1:].swapaxes(0,1)
b = x[:,1:,:-1].swapaxes(0,1)
a = x[:,1:,1:].swapaxes(0,1)
x = W[:,:,0,0].dot(a) + W[:,:,0,1].dot(b) + W[:,:,1,0].dot(c) + W[:,:,1,1].dot(d) + biases.reshape(-1,1,1)

This is 10 times faster than the original code.

Accelerating max pool

With this new code, the max pool stage now takes 50% of the time. This can also be sped up by using:

def max_pool(x):
    """Return maximum in groups of 2x2 for a N,h,w image"""
    N,h,w = x.shape
    x = x.reshape(N,h/2,2,w/2,2).swapaxes(2,3).reshape(N,h/2,w/2,4)
    return np.amax(x,axis=3)

This speeds up the max_pool step by a factor of 10, so overall the program doubles in speed again.

Increasing speed of a pure Numpy/Scipy convolutional neural network implementation

Tags:

python

algorithm

neural-network

numpy

Background

What I've tried

Question

Code

Update

Peter de Rivaz

1 Answers

Accelerating convolution

Accelerating max pool

Peter de Rivaz

Recent Activity

Donate For Us

Increasing speed of a pure Numpy/Scipy convolutional neural network implementation

Tags:

python

algorithm

neural-network

numpy

Background

What I've tried

Question

Code

Update

Peter de Rivaz

1 Answers

Accelerating convolution

Accelerating max pool

Peter de Rivaz

Related questions

Recent Activity

Donate For Us