Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dealing with Memory Problems in Network with Many Weights

I have a neural network with the architecture 1024, 512, 256, 1 (the input layer has 1024 units, the output layer has 1 unit, etc). I would like to train this network using one of the optimization algorithms in scipy.optimize.

The problem is that these algorithms expect the function parameters to be given in one vector; this means that, in my case, I have to unroll all the weights in a vector of length

1024*512 + 512*256 + 256*1 = 655616

Some algorithms (like fmin_bfgs) need to use identity matrices, so they make a call like

I = numpy.eye(655616)

which, not very surprisingly, produces a MemoryError. Is there any way for me to avoid having to unroll all the weights into one vector, short of adapting the algorithms in scipy.optimize to my own needs?

like image 511
Paul Manta Avatar asked Nov 12 '22 10:11

Paul Manta


1 Answers

Don't try to fit the weights to a NN using L-BFGS. It doesn't work especially well (see early Yann LeCun papers), and because it's a second-order method you're going to be attempting to approximate the Hessian, which for that many weights is a 655,000 x 650,000 matrix: this introduces a performance overhead that simply won't be justified.

The network isn't that deep: is there a reason you're avoiding standard back-prop? This is just gradient descent if you have access to a library implementation, the gradients are cheap to compute and because it's only a first order method, you'll not have the same performance overhead.

EDIT:

Backpropogation means that the update rule for w_i at step t is:

w_i(t) = w_i(t-1) - \alpha (dError / dw_i)

Also, you've run into the reason why people in vision often use Convolutional NNs: sparse connectivity massively reduces the size of the weight vector where you have one neuron per pixel.

like image 157
Ben Allison Avatar answered Nov 15 '22 05:11

Ben Allison