Dealing with Memory Problems in Network with Many Weights

Question

I have a neural network with the architecture 1024, 512, 256, 1 (the input layer has 1024 units, the output layer has 1 unit, etc). I would like to train this network using one of the optimization algorithms in scipy.optimize.

The problem is that these algorithms expect the function parameters to be given in one vector; this means that, in my case, I have to unroll all the weights in a vector of length

1024*512 + 512*256 + 256*1 = 655616

Some algorithms (like fmin_bfgs) need to use identity matrices, so they make a call like

I = numpy.eye(655616)

which, not very surprisingly, produces a MemoryError. Is there any way for me to avoid having to unroll all the weights into one vector, short of adapting the algorithms in scipy.optimize to my own needs?

Ben Allison · Accepted Answer

Don't try to fit the weights to a NN using L-BFGS. It doesn't work especially well (see early Yann LeCun papers), and because it's a second-order method you're going to be attempting to approximate the Hessian, which for that many weights is a 655,000 x 650,000 matrix: this introduces a performance overhead that simply won't be justified.

The network isn't that deep: is there a reason you're avoiding standard back-prop? This is just gradient descent if you have access to a library implementation, the gradients are cheap to compute and because it's only a first order method, you'll not have the same performance overhead.

EDIT:

Backpropogation means that the update rule for w_i at step t is:

w_i(t) = w_i(t-1) - \alpha (dError / dw_i)

Also, you've run into the reason why people in vision often use Convolutional NNs: sparse connectivity massively reduces the size of the weight vector where you have one neuron per pixel.

Dealing with Memory Problems in Network with Many Weights

Tags:

python

machine-learning

numpy

scipy

Paul Manta

1 Answers

Ben Allison

Recent Activity

Donate For Us

Dealing with Memory Problems in Network with Many Weights

Tags:

python

machine-learning

numpy

scipy

Paul Manta

1 Answers

Ben Allison

Related questions

Recent Activity

Donate For Us