I have a neural network with the architecture 1024, 512, 256, 1
(the input layer has 1024
units, the output layer has 1
unit, etc). I would like to train this network using one of the optimization algorithms in scipy.optimize
.
The problem is that these algorithms expect the function parameters to be given in one vector; this means that, in my case, I have to unroll all the weights in a vector of length
1024*512 + 512*256 + 256*1 = 655616
Some algorithms (like fmin_bfgs
) need to use identity matrices, so they make a call like
I = numpy.eye(655616)
which, not very surprisingly, produces a MemoryError
. Is there any way for me to avoid having to unroll all the weights into one vector, short of adapting the algorithms in scipy.optimize
to my own needs?
Don't try to fit the weights to a NN using L-BFGS. It doesn't work especially well (see early Yann LeCun papers), and because it's a second-order method you're going to be attempting to approximate the Hessian, which for that many weights is a 655,000 x 650,000 matrix: this introduces a performance overhead that simply won't be justified.
The network isn't that deep: is there a reason you're avoiding standard back-prop? This is just gradient descent if you have access to a library implementation, the gradients are cheap to compute and because it's only a first order method, you'll not have the same performance overhead.
EDIT:
Backpropogation means that the update rule for w_i at step t is:
w_i(t) = w_i(t-1) - \alpha (dError / dw_i)
Also, you've run into the reason why people in vision often use Convolutional NNs: sparse connectivity massively reduces the size of the weight vector where you have one neuron per pixel.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With