Neural network backprop not fully training

Tags:

I have this neural network that I've trained seen bellow, it works, or at least appears to work, but the problem is with the training. I'm trying to train it to act as an OR gate, but it never seems to get there, the output tends to looks like this:

prior to training:

 [[0.50181624]
 [0.50183743]
 [0.50180414]
 [0.50182533]]

post training:

 [[0.69641759]
 [0.754652  ]
 [0.75447178]
 [0.79431198]]

expected output:

 [[0]
 [1]
 [1]
 [1]]

I have this loss graph:

enter image description here

Its strange it appears to be training, but at the same time not quite getting to the expected output. I know that it would never really achieve the 0s and 1s, but at the same time I expect it to manage and get something a little bit closer to the expected output.

I had some issues trying to figure out how to back prop the error as I wanted to make this network have any number of hidden layers, so I stored the local gradient in a layer, along side the weights, and sent the error from the end back.

The main functions I suspect are the culprits are NeuralNetwork.train and both forward methods.

import sys
import math
import numpy as np
import matplotlib.pyplot as plt
from itertools import product


class NeuralNetwork:
    class __Layer:
        def __init__(self,args):
            self.__epsilon = 1e-6
            self.localGrad = 0
            self.__weights = np.random.randn(
                args["previousLayerHeight"],
                args["height"]
            )*0.01
            self.__biases = np.zeros(
                (args["biasHeight"],1)
            )

        def __str__(self):
            return str(self.__weights)

        def forward(self,X):
            a = np.dot(X, self.__weights) + self.__biases
            self.localGrad = np.dot(X.T,self.__sigmoidPrime(a))
            return self.__sigmoid(a)

        def adjustWeights(self, err):
            self.__weights -= (err * self.__epsilon)

        def __sigmoid(self, z):
            return 1/(1 + np.exp(-z))

        def __sigmoidPrime(self, a):
            return self.__sigmoid(a)*(1 - self.__sigmoid(a))

    def __init__(self,args):
        self.__inputDimensions = args["inputDimensions"]
        self.__outputDimensions = args["outputDimensions"]
        self.__hiddenDimensions = args["hiddenDimensions"]
        self.__layers = []
        self.__constructLayers()

    def __constructLayers(self):
        self.__layers.append(
            self.__Layer(
                {
                    "biasHeight": self.__inputDimensions[0],
                    "previousLayerHeight": self.__inputDimensions[1],
                    "height": self.__hiddenDimensions[0][0] 
                        if len(self.__hiddenDimensions) > 0 
                        else self.__outputDimensions[0]
                }
            )
        )

        for i in range(len(self.__hiddenDimensions)):
            self.__layers.append(
                self.__Layer(
                    {
                        "biasHeight": self.__hiddenDimensions[i + 1][0] 
                            if i + 1 < len(self.__hiddenDimensions)
                            else self.__outputDimensions[0],
                        "previousLayerHeight": self.__hiddenDimensions[i][0],
                        "height": self.__hiddenDimensions[i + 1][0] 
                            if i + 1 < len(self.__hiddenDimensions)
                            else self.__outputDimensions[0]
                    }
                )
            )

    def forward(self,X):
        out = self.__layers[0].forward(X)
        for i in range(len(self.__layers) - 1):
            out = self.__layers[i+1].forward(out)
        return out  

    def train(self,X,Y,loss,epoch=5000000):
        for i in range(epoch):
            YHat = self.forward(X)
            delta = -(Y-YHat)
            loss.append(sum(Y-YHat))
            err = np.sum(np.dot(self.__layers[-1].localGrad,delta.T), axis=1)
            err.shape = (self.__hiddenDimensions[-1][0],1)
            self.__layers[-1].adjustWeights(err)
            i=0
            for l in reversed(self.__layers[:-1]):
                err = np.dot(l.localGrad, err)
                l.adjustWeights(err)
                i += 1

    def printLayers(self):
        print("Layers:\n")
        for l in self.__layers:
            print(l)
            print("\n")

def main(args):
    X = np.array([[x,y] for x,y in product([0,1],repeat=2)])
    Y = np.array([[0],[1],[1],[1]])
    nn = NeuralNetwork(
        {
            #(height,width)
            "inputDimensions": (4,2),
            "outputDimensions": (1,1),
            "hiddenDimensions":[
                (6,1)
            ]
        }
    )

    print("input:\n\n",X,"\n")
    print("expected output:\n\n",Y,"\n")
    nn.printLayers()
    print("prior to training:\n\n",nn.forward(X), "\n")
    loss = []
    nn.train(X,Y,loss)
    print("post training:\n\n",nn.forward(X), "\n")
    nn.printLayers()
    fig,ax = plt.subplots()

    x = np.array([x for x in range(5000000)])
    loss = np.array(loss)
    ax.plot(x,loss)
    ax.set(xlabel="epoch",ylabel="loss",title="logic gate training")

    plt.show()

if(__name__=="__main__"):
    main(sys.argv[1:])

Could someone please point out what I'm doing wrong here, I strongly suspect it has to do with the way I'm dealing with matrices but at the same time I don't have the slightest idea what's going on.

Thanks for taking the time to read my question, and taking the time to respond (if relevant).

edit: Actually quite a lot is wrong with this but I'm still a bit confused over how to fix it. Although the loss graph looks like its training, and it kind of is, the math I've done above is wrong.

Look at the training function.

def train(self,X,Y,loss,epoch=5000000):
        for i in range(epoch):
            YHat = self.forward(X)
            delta = -(Y-YHat)
            loss.append(sum(Y-YHat))
            err = np.sum(np.dot(self.__layers[-1].localGrad,delta.T), axis=1)
            err.shape = (self.__hiddenDimensions[-1][0],1)
            self.__layers[-1].adjustWeights(err)
            i=0
            for l in reversed(self.__layers[:-1]):
                err = np.dot(l.localGrad, err)
                l.adjustWeights(err)
                i += 1

Note how I get delta = -(Y-Yhat) and then dot product it with the "local gradient" of the last layer. The "local gradient" is the local W gradient.

def forward(self,X):
    a = np.dot(X, self.__weights) + self.__biases
    self.localGrad = np.dot(X.T,self.__sigmoidPrime(a))
    return self.__sigmoid(a)

I'm skipping a step in the chain rule. I should really be multiplying by W* sigprime(XW + b) first as that's the local gradient of X, then by the local W gradient. I tried that, but I'm still getting issues, here is the new forward method (note the __init__ for layers needs to be initialised for the new vars, and I changed the activation function to tanh)

def forward(self, X):
    a = np.dot(X, self.__weights) + self.__biases
    self.localPartialGrad = self.__tanhPrime(a)
    self.localWGrad = np.dot(X.T, self.localPartialGrad)
    self.localXGrad = np.dot(self.localPartialGrad,self.__weights.T)            
    return self.__tanh(a)

and updated the training method to look something like this:

def train(self, X, Y, loss, epoch=5000):
    for e in range(epoch):
        Yhat = self.forward(X)
        err = -(Y-Yhat)
        loss.append(sum(err))
        print("loss:\n",sum(err))
        for l in self.__layers[::-1]:
            l.adjustWeights(err)
            if(l != self.__layers[0]):
                err = np.multiply(err,l.localPartialGrad)
                err = np.multiply(err,l.localXGrad)

The new graphs I'm getting are all over the place, I have no idea what's going on. Here is the final bit of code I changed:

def adjustWeights(self, err):
    perr = np.multiply(err, self.localPartialGrad)  
    werr = np.sum(np.dot(self.__weights,perr.T),axis=1)
    werr = werr * self.__epsilon
    werr.shape = (self.__weights.shape[0],1)
    self.__weights = self.__weights - werr

206

asked Feb 25 '18 04:02

James

1 Answers

Your network is learning, as can be seen from the loss chart, so backprop implementation is correct (congrats!). The main problem with this particular architecture is the choice of the activation function: sigmoid. I have replaced sigmoid with tanh and it works much better instantly.

From this discussion on CV.SE:

There are two reasons for that choice (assuming you have normalized your data, and this is very important):

Having stronger gradients: since data is centered around 0, the derivatives are higher. To see this, calculate the derivative of the tanh function and notice that input values are in the range [0,1]. The range of the tanh function is [-1,1] and that of the sigmoid function is [0,1]

Avoiding bias in the gradients. This is explained very well in the paper, and it is worth reading it to understand these issues.

Though I'm sure sigmoid-based NN can be trained as well, looks like it's much more sensitive to input values (note that they are not zero-centered), because the activation itself is not zero-centered. tanh is better than sigmoid by all means, so a simpler approach is just use that activation function.

The key change is this:

def __tanh(self, z):
  return np.tanh(z)

def __tanhPrime(self, a):
  return 1 - self.__tanh(a) ** 2

... instead of __sigmoid and __sigmoidPrime.

I have also tuned hyperparameters a little bit, so that the network now learns in 100k epochs, instead of 5m:

prior to training:

 [[ 0.        ]
 [-0.00056925]
 [-0.00044885]
 [-0.00101794]] 

post training:

 [[0.        ]
 [0.97335842]
 [0.97340917]
 [0.98332273]]

plot

A complete code is in this gist.

answered Oct 02 '22 11:10

Maxim

Related questions
                            
                                Using __prepare__ for an Enum ... what's the catch?
                            
                                regex string and substring
                            
                                What does "MEIPASS" stand for?
                            
                                Adding common parameters to groups with Click
                            
                                asyncio as_yielded from async generators
                            
                                Understanding free OPC/UA code in python
                            
                                How to round to specific values in Python
                            
                                Calculate distance to shore or coastline for a vessel
                            
                                Concatenate Two Dataframes Pandas with single Row
                            
                                Invoking AWS Step function from Lambda in python
                            
                                Not able to Scrape geo coordinate with tweets [Lat-Lon]
                            
                                How to merge pyspark and pandas dataframes
                            
                                how to make a nested tqdm bars on jupyter notebook
                            
                                Running aiohttp server using gunicorn
                            
                                Epoch counter with TensorFlow Dataset API
                            
                                py.test: Show local variables in Jenkins
                            
                                ctags, vim and python code
                            
                                Do Django Model Managers require using=self._db
                            
                                How to return all the columns with flask-sqlalchemy query join from two tables
                            
                                How to refresh Selenium Webdriver DOM data without reloading page?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Neural network backprop not fully training

Tags:

python

machine-learning

neural-network

matrix

numpy

James

People also ask

1 Answers

Maxim

Recent Activity

Donate For Us