Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Octave : logistic regression : difference between fmincg and fminunc

I often use fminunc for a logistic regression problem. I have read on web that Andrew Ng uses fmincg instead of fminunc, with same arguments. The results are different, and often fmincg is more exact, but not too much. (I am comparing the results of fmincg function fminunc against the same data)

So, my question is : what is the difference between these two functions? What algorithm does each function have implemented? (Now, I just use these functions without knowing exactly how they work).

Thanks :)

like image 202
hqt Avatar asked Aug 24 '12 18:08

hqt


People also ask

What is Fminunc in octave?

fminunc attempts to determine a vector x such that fcn ( x ) is a local minimum. x0 determines a starting guess. The shape of x0 is preserved in all calls to fcn , but otherwise is treated as a column vector.

Why do we use Fminunc?

fminunc is for nonlinear problems without constraints. If your problem has constraints, generally use fmincon . See Optimization Decision Table. x = fminunc( fun , x0 , options ) minimizes fun with the optimization options specified in options .


2 Answers

You will have to look inside the code of fmincg because it is not part of Octave. After some search I found that it's a function file provided by the Machine Learning class of Coursera as part of the homework. Read the comments and answers on this question for a discussion about the algorithms.

like image 186
carandraug Avatar answered Sep 17 '22 23:09

carandraug


In contrast to other answers here suggesting the primary difference between fmincg and fminunc is accuracy or speed, perhaps the most important difference for some applications is memory efficiency. In programming exercise 4 (i.e., Neural Network Training) of Andrew Ng's Machine Learning class at Coursera, the comment in ex4.m about fmincg is

%% =================== Part 8: Training NN ===================
% You have now implemented all the code necessary to train a neural
% network. To train your neural network, we will now use "fmincg", which
% is a function which works similarly to "fminunc". Recall that these
% advanced optimizers are able to train our cost functions efficiently as
% long as we provide them with the gradient computations.

Like the original poster, I was also curious about how the results of ex4.m might differ using fminunc instead of fmincg. So I tried to replace the fmincg call

options = optimset('MaxIter', 50); [nn_params, cost] = fmincg(costFunction, initial_nn_params, options); 

with the following call to fminunc

options = optimset('GradObj', 'on', 'MaxIter', 50); [nn_params, cost, exit_flag] = fminunc(costFunction, initial_nn_params, options); 

but got the following error message from a 32-bit build of Octave running on Windows:

error: memory exhausted or requested size too large for range of Octave's index type -- trying to return to prompt

A 32-bit build of MATLAB running on Windows provides a more detailed error message:

Error using find
Out of memory. Type HELP MEMORY for your options.
Error in spones (line 14)
[i,j] = find(S);
Error in color (line 26)
J = spones(J);
Error in sfminbx (line 155)
group = color(Hstr,p);
Error in fminunc (line 408)
[x,FVAL,~,EXITFLAG,OUTPUT,GRAD,HESSIAN] = sfminbx(funfcn,x,l,u, ...
Error in ex4 (line 205)
[nn_params, cost, exit_flag] = fminunc(costFunction, initial_nn_params, options);

The MATLAB memory command on my laptop computer reports:

Maximum possible array: 2046 MB (2.146e+09 bytes) *
Memory available for all arrays: 3402 MB (3.568e+09 bytes) **
Memory used by MATLAB: 373 MB (3.910e+08 bytes)
Physical Memory (RAM): 3561 MB (3.734e+09 bytes)
* Limited by contiguous virtual address space available.
** Limited by virtual address space available.

I previously was thinking that Professor Ng chose to use fmincg to train the ex4.m neural network (that has 400 input features, 401 including the bias input) to increase the training speed. However, now I believe his reason for using fmincg was to increase the memory efficiency enough to allow the training to be performed on 32-bit builds of Octave/MATLAB. A short discussion about the necessary work to get a 64-bit build of Octave that runs on Windows OS is here.

like image 27
gregS Avatar answered Sep 21 '22 23:09

gregS