Backpropagation for rectified linear unit activation with cross entropy error

Tags:

I'm trying to implement gradient calculation for neural networks using backpropagation. I cannot get it to work with cross entropy error and rectified linear unit (ReLU) as activation.

I managed to get my implementation working for squared error with sigmoid, tanh and ReLU activation functions. Cross entropy (CE) error with sigmoid activation gradient is computed correctly. However, when I change activation to ReLU - it fails. (I'm skipping tanh for CE as it retuls values in (-1,1) range.)

Is it because of the behavior of log function at values close to 0 (which is returned by ReLUs approx. 50% of the time for normalized inputs)? I tried to mitiage that problem with:

log(max(y,eps))

but it only helped to bring error and gradients back to real numbers - they are still different from numerical gradient.

I verify the results using numerical gradient:

num_grad = (f(W+epsilon) - f(W-epsilon)) / (2*epsilon)

The following matlab code presents a simplified and condensed backpropagation implementation used in my experiments:

function [f, df] = backprop(W, X, Y)
% W - weights
% X - input values
% Y - target values

act_type='relu';    % possible values: sigmoid / tanh / relu
error_type = 'CE';  % possible values: SE / CE

N=size(X,1); n_inp=size(X,2); n_hid=100; n_out=size(Y,2);
w1=reshape(W(1:n_hid*(n_inp+1)),n_hid,n_inp+1);
w2=reshape(W(n_hid*(n_inp+1)+1:end),n_out, n_hid+1);

% feedforward
X=[X ones(N,1)];
z2=X*w1'; a2=act(z2,act_type); a2=[a2 ones(N,1)];
z3=a2*w2'; y=act(z3,act_type);

if strcmp(error_type, 'CE')   % cross entropy error - logistic cost function
    f=-sum(sum( Y.*log(max(y,eps))+(1-Y).*log(max(1-y,eps)) ));
else % squared error
    f=0.5*sum(sum((y-Y).^2));
end

% backprop
if strcmp(error_type, 'CE')   % cross entropy error
    d3=y-Y;
else % squared error
    d3=(y-Y).*dact(z3,act_type);
end

df2=d3'*a2;
d2=d3*w2(:,1:end-1).*dact(z2,act_type);
df1=d2'*X;

df=[df1(:);df2(:)];

end

function f=act(z,type) % activation function
switch type
    case 'sigmoid'
        f=1./(1+exp(-z));
    case 'tanh'
        f=tanh(z);
    case 'relu'
        f=max(0,z);
end
end

function df=dact(z,type) % derivative of activation function
switch type
    case 'sigmoid'
        df=act(z,type).*(1-act(z,type));
    case 'tanh'
        df=1-act(z,type).^2;
    case 'relu'
        df=double(z>0);
end
end

Edit

After another round of experiments, I found out that using a softmax for the last layer:

y=bsxfun(@rdivide, exp(z3), sum(exp(z3),2));

and softmax cost function:

f=-sum(sum(Y.*log(y)));

make the implementaion working for all activation functions including ReLU.

This leads me to conclusion that it is the logistic cost function (binary clasifier) that does not work with ReLU:

f=-sum(sum( Y.*log(max(y,eps))+(1-Y).*log(max(1-y,eps)) ));

However, I still cannot figure out where the problem lies.

710

asked Jun 22 '14 12:06

Pr1mer

1 Answers

Every squashing function sigmoid, tanh and softmax (in the output layer) means different cost functions. Then makes sense that a RLU (in the output layer) does not match with the cross entropy cost function. I will try a simple square error cost function to test a RLU output layer.

The true power of RLU is in the hidden layers of a deep net since it not suffer from gradient vanishing error.

193

answered Nov 15 '22 04:11

Seguy

Related questions
                            
                                Return elements of the Groebner Basis as they are found
                            
                                Free Energy Reinforcement Learning Implementation
                            
                                is there any way to enlarge the font of menu bar and prompt windows in matlab
                            
                                Real time music transcription [closed]
                            
                                Matlab legend text overflows using Latex interpreter
                            
                                Reading MatLab files in python w/ scipy
                            
                                Canonical Correlation Analysis in Python with sklearn
                            
                                Optimizing rank computation for very large sparse matrices
                            
                                Why is batch mode so much faster than parfor?
                            
                                Factorise symbolic expression (square of a sum) in MATLAB
                            
                                How to change camera parameters (auto exposure, shutter speed, gain)?
                            
                                MATLAB pcolor/surf bilinear interpolation (shading interp)
                            
                                C/C++ Matlab compiler vs MKL
                            
                                intersection and union of polygons
                            
                                How to prevent LATEX-labels in MATLAB GUI to become blurry?
                            
                                Graph a colored cube in matplotlib
                            
                                Read Matlab matrix into Python
                            
                                What extra data is stored by an anonymous function?
                            
                                Analytical way of speeding up exp(A*x) in MATLAB
                            
                                How to blend properly when stitching images in matlab?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Backpropagation for rectified linear unit activation with cross entropy error

Tags:

machine-learning

neural-network

backpropagation

matlab

Pr1mer

People also ask

1 Answers

Seguy

Recent Activity

Donate For Us