Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neural Network implementation in java

I am attempting to implement a FFNN in Java with backpropagation and have no idea what I am doing wrong. It worked when I had only a single neuron in the network, but I wrote another class to handle larger networks and nothing converges. It seems like a problem in the math - or rather my implementation of the math - but I've checked it several times and I can't find anything wrong. This should be working.
Node class:

package arr;

import util.ActivationFunction;
import util.Functions;

public class Node {
    public ActivationFunction f;
    public double output;
    public double error;

    private double sumInputs;
    private double sumErrors;
    public Node(){
        sumInputs = 0;
        sumErrors = 0;
        f = Functions.SIG;
        output = 0;
        error = 0;
    }
    public Node(ActivationFunction func){
        this();
        this.f = func;
    }

    public void addIW(double iw){
        sumInputs += iw;
    }
    public void addIW(double input, double weight){
        sumInputs += (input*weight);
    }
    public double calculateOut(){
        output = f.eval(sumInputs);
        return output;
    }

    public void addEW(double ew){
        sumErrors+=ew;
    }
    public void addEW(double error, double weight){
        sumErrors+=(error*weight);
    }
    public double calculateError(){
        error = sumErrors * f.deriv(sumInputs);
        return error;
    }   
    public void resetValues(){
        sumErrors = 0;
        sumInputs = 0;
    }
}

LineNetwork class:

package arr;
import util.Functions;

public class LineNetwork {
public double[][][] weights;    //layer of node to, # of node to, # of node from
public Node[][] nodes;          //layer, #
public double lc;
public LineNetwork(){
    weights = new double[2][][];
    weights[0] = new double[2][1];
    weights[1] = new double[1][3];
    initializeWeights();
    nodes = new Node[2][];
    nodes[0] = new Node[2];
    nodes[1] = new Node[1];
    initializeNodes();
    lc = 1;
}
private void initializeWeights(){
    for(double[][] layer: weights)
        for(double[] curNode: layer)
            for(int i=0; i<curNode.length; i++)
                curNode[i] = Math.random()/10;
}
private void initializeNodes(){
    for(Node[] layer: nodes)
        for(int i=0; i<layer.length; i++)
            layer[i] = new Node();
    nodes[nodes.length-1][0].f = Functions.HSF;
}
public double feedForward(double[] inputs) {
    for(int j=0; j<nodes[0].length; j++)
        nodes[0][j].addIW(inputs[j], weights[0][j][0]);
    double[] outputs = new double[nodes[0].length];
    for(int i=0; i<nodes[0].length; i++)
        outputs[i] = nodes[0][i].calculateOut();
    for(int l=1; l<nodes.length; l++){
        for(int i=0; i<nodes[l].length; i++){
            for(int j=0; j<nodes[l-1].length; j++)
                nodes[l][i].addIW(
                        outputs[j], 
                        weights[l][i][j]);
            nodes[l][i].addIW(weights[l][i][weights[l][i].length-1]);
        }
        outputs = new double[nodes[l].length];
        for(int i=0; i<nodes[l].length; i++)
            outputs[i] = nodes[l][i].calculateOut();
    }
    return outputs[0];
}

public void backpropagate(double[] inputs, double expected) {
    nodes[nodes.length-1][0].addEW(expected-nodes[nodes.length-1][0].output);
    for(int l=nodes.length-2; l>=0; l--){
        for(Node n: nodes[l+1])
            n.calculateError();
        for(int i=0; i<nodes[l].length; i++)
            for(int j=0; j<nodes[l+1].length; j++)
                nodes[l][i].addEW(nodes[l+1][j].error, weights[l+1][j][i]);
        for(int j=0; j<nodes[l+1].length; j++){
            for(int i=0; i<nodes[l].length; i++)
                weights[l+1][j][i] += nodes[l][i].output*lc*nodes[l+1][j].error;
            weights[l+1][j][nodes[l].length] += lc*nodes[l+1][j].error;
        }
    }
    for(int i=0; i<nodes[0].length; i++){
        weights[0][i][0] += inputs[i]*lc*nodes[0][i].calculateError();
    }
}
public double train(double[] inputs, double expected) {
    double r = feedForward(inputs);
    backpropagate(inputs, expected);
    return r;
}
public void resetValues() {
    for(Node[] layer: nodes)
        for(Node n: layer)
            n.resetValues();
}

public static void main(String[] args) {
    LineNetwork ln = new LineNetwork();
    System.out.println(str2d(ln.weights[0]));
    for(int i=0; i<10000; i++){
        double[] in = {Math.round(Math.random()),Math.round(Math.random())};
        int out = 0;
        if(in[1]==1 ^ in[0] ==1) out = 1;
        ln.resetValues();
        System.out.print(i+": {"+in[0]+", "+in[1]+"}: "+out+" ");
        System.out.println((int)ln.train(in, out));
    }
    System.out.println(str2d(ln.weights[0]));
}
private static String str2d(double[][] a){
    String str = "[";
    for(double[] arr: a)
        str = str + str1d(arr) + ",\n";
    str = str.substring(0, str.length()-2)+"]";
    return str;
}
private static String str1d(double[] a){
    String str = "[";
    for(double d: a)
        str = str+d+", ";
    str = str.substring(0, str.length()-2)+"]";
    return str;
}
}

Quick explanation of structure: every node has an activation function f; f.eval evaluates the function and f.deriv evaluates its derivative. Functions.SIG is the standard sigmoidal function and Functions.HSF is the Heaviside step function. In order to set the inputs of a function, you call addIW with a value that already includes the weight of the previous output. A similar thing is done in backpropagation with addEW. Nodes are organized in a 2d array and weights are organized separately in a 3d array as described.

I realize this may be a bit much to ask - and I certainly realize how many Java conventions this code breaks - but I appreciate any help anyone can offer.

EDIT: Since this question and my code are such giant walls of text, if there's a line involving lots of complicated expressions in brackets that you don't want to figure out, add a comment or something asking me and I'll try to answer it as quickly as I can.

EDIT 2: The specific problem here is that this network doesn't converge on XOR. Here's some output to illustrate this:

9995: {1.0, 0.0}: 1 1
9996: {0.0, 1.0}: 1 1
9997: {0.0, 0.0}: 0 1
9998: {0.0, 1.0}: 1 0
9999: {0.0, 1.0}: 1 1
Each line is of the format TEST NUMBER: {INPUTS}: EXPECTED ACTUAL The network calls train with each test, so this network is backpropagating 10000 times.

Here are the two extra classes if anyone wants to run it:

package util;

public class Functions {
public static final ActivationFunction LIN = new ActivationFunction(){
            public double eval(double x) {
                return x;
            }

            public double deriv(double x) {
                return 1;
            }
};
public static final ActivationFunction SIG = new ActivationFunction(){
            public double eval(double x) {
                return 1/(1+Math.exp(-x));
            }

            public double deriv(double x) {
                double ev = eval(x);
                return ev * (1-ev);
            }
};
public static final ActivationFunction HSF = new ActivationFunction(){
            public double eval(double x) {
                if(x>0) return 1;
                return 0;
            }

            public double deriv(double x) {
                return (1);
            }
};
}

package util;

public interface ActivationFunction {
public double eval(double x);
public double deriv(double x);
}

Now it's even longer. Darn.

like image 610
Nate Young Avatar asked Nov 10 '22 09:11

Nate Young


1 Answers

In your main method:

double[] in = {Math.round(Math.random()),Math.round(Math.random())};
int out = 0;
if(in[1]==1 ^ in[0] ==1) out = 1;

You create a random input (combination of 1 and 0) that receives target 0. Since Math.random has a specific internal seed (there is no true randomness) you are not able to guarantee that over 10000 iterations all 4 inputs of XOR are generated by a balanced amount with this technique. This in turn means that it is possible that in 10000 iterations {0.0,0.0} was only trained a couple hundred times while {1.0,0.0} {0.0,1.0} was trained about 8000 times. If this is the case, this would clearly explain your results and limit your training.

Instead of randomly generating your input data, randomly pick from it. Keep the outer (epochs) loop and introduce a 2nd loop where you pick a random sample that you have not picked in this epoch yet (or simply go through your data sequentially without any randomness, it's not really an issue for XOR). Pseudocode without any randomness:

// use a custom class to realize the data structure (that defines the target values):
TrainingSet = { {(0,0),0}, {(0,1),1}, {(1,0),1}, {(1,1),0} } 
for epochNr < epochs:
    while(TrainingSet.hasNext()):
        input = TrainingSet.getNext();
        network.feedInput(input)

This way you can guarantee that each sample is seen 2500 times in 10000 iterations.

like image 172
runDOSrun Avatar answered Nov 14 '22 23:11

runDOSrun