I have asked a few questions about neural networks on this website in the past and have gotten great answers, but I am still struggling to implement one for myself. This is quite a long question, but I am hoping that it will serve as a guide for other people creating their own basic neural networks in MATLAB, so it should be worth it.
What I have done so far could be completely wrong. I am following the online stanford machine learning course by Professor Andrew Y. Ng and have tried to implement what he has taught to the best of my ability.
Can you please tell me if the feed forward and cost function parts of my code are correct, and where I am going wrong in the minimization (optimization) part?
I have a feed 2 layer feed forward neural network.
The MATLAB code for the feedforward part is:
function [ Y ] = feedforward2( X,W1,W2)
%This takes a row vector of inputs into the neural net with weight matrices W1 and W2 and returns a row vector of the outputs from the neural net
%Remember X, Y, and A can be vectors, and W1 and W2 Matrices
X=transpose(X); %X needs to be a column vector
A = sigmf(W1*X,[1 0]); %Values of the first hidden layer
Y = sigmf(W2*A,[1 0]); %Output Values of the network
Y = transpose(Y); %Y needs to be a column vector
So for example a two layer neural net with two inputs and two outputs would look a bit like this:
a1
x1 o--o--o y1 (all weights equal 1)
\/ \/
/\ /\
x2 o--o--o y2
a2
if we put in:
X=[2,3];
W1=ones(2,2);
W2=ones(2,2);
Y = feedforward2(X,W1,W2)
we get the the output:
Y = [0.5,0.5]
This represents the y1 and y2 values shown in the drawing of the neural net
The MATLAB code for the squared error cost function is:
function [ C ] = cost( W1,W2,Xtrain,Ytrain )
%This gives a value seeing how close W1 and W2 are to giving a network that represents the Xtrain and Ytrain data
%It uses the squared error cost function
%The closer the cost is to zero, the better these particular weights are at giving a network that represents the training data
%If the cost is zero, the weights give a network that when the Xtrain data is put in, The Ytrain data comes out
M = size(Xtrain,1); %Number of training examples
oldsum = 0;
for i = 1:M,
H = feedforward2(Xtrain,W1,W2);
temp = ( H(i) - Ytrain(i) )^2;
Sum = temp + oldsum;
oldsum = Sum;
end
C = (1/2*M) * Sum;
end
Example
So for example if the training data is:
Xtrain =[0,0; Ytrain=[0/57;
1,2; 3/57;
4,1; 5/57;
5,2; 7/57; a1
3,4; 7/57; %This will be for a two input one output network x1 o--o y1
5,3; 8/57; \/ \_o
1,5; 6/57; /\ /
6,2; 8/57; x2 o--o
2,1; 3/57; a2
5,5;] 10/57;]
We start with initial random weights
W1=[2,3; W2=[3,2]
4,1]
If we put in:
Y= feedforward2([6,2],W1,W2)
We get
Y = 0.9933
Which is far from what the training data says it should be (8/57 = 0.1404). So the initial random weights W1 and W2 where a bad guess.
To measure exactly how bad/good a guess the random weights weights are we use the cost function:
C= cost(W1,W2,Xtrain,Ytrain)
This gives the value:
C = 6.6031e+003
Minimizing the cost function
If we minimize the cost function by searching all of the possible variables W1 and W2 and then picking the lowest, this will give the network that best approximates the training data
But when I Use the code:
[W1,W2]=fminsearch(cost(W1,W2,Xtrain,Ytrain),[W1,W2])
It gives an error message. It says: "Error using horzcat. CAT arguments dimensions are not consistent."Why am I getting this error and what can I do to fix it?
Can you please tell me if the feed forward and cost function parts of my code are correct, and where I am going wrong in the minimization (optimization) part?
Thank you!!!
Your Neural network seems alright, although the kind of training you're trying to do is quite in-efficient if you're training against labeled data as you're doing. In that case I would suggest looking into Back-propagation
About your error when training: Your error message hints at the problem: dimensions are not consistent
As argument x0
in fminsearch
which is the initial guess for the optimizer, you send [W1, W2]
but from what I can see, these matrices don't have the same number of rows, and therefore you can't add them together like that. I would suggest modifying your cost-function to take a vector as argument and then form your weight-vectors for different layers from that one vector.
You are also not supplying the cost-function correctly to fminsearch
as you are just evaluating cost
with w1, w2, Xtrain and Ytrain in-place.
According to the documentation (it's been years since I used Matlab) it seems like you pass the pointer to the cost-function as
fminsearch(cost, [W1; W2])
EDIT: You could express your weights and modify your code as follows:
global Xtrain
global Ytrain
W = [W1; W2]
fminsearch(cost, W)
Cost-function must be modified such that it doesn't take Xtrain, Ytrain as input because fminsearch will then try to optimize those too. Modify your cost-function like this:
function [ C ] = cost( W )
W1 = W[1:2,:]
W2 = W[3,:]
global Xtrain
global Ytrain
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With