Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multilayer perceptron implementation: weights go crazy

I am writing a simple implementation of the MLP with a single output unit (binary classification). I need it for teaching purposes, so I can't use existing implementation :(

I managed to create a working dummy model and implemented training function, but the MLP does not converge. Indeed, gradient for the output unit remains high over epochs, so its weights approach infinity.

My implementation:

import numpy as np
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

X = np.loadtxt('synthetic.txt')
t = X[:, 2].astype(np.int)
X = X[:, 0:2]

# Sigmoid activation function for output unit
def logistic(x):
    return 1/(1 + np.exp(-x))

# derivative of the tanh activation function for hidden units
def tanh_deriv(x):
    return 1 - np.tanh(x)*np.tanh(x)

input_num = 2            # number of units in the input layer
hidden_num = 2           # number of units in the hidden layer

# initialize weights with random values:
weights_hidden =  np.array((2 * np.random.random( (input_num + 1, hidden_num + 1) ) - 1 ) * 0.25)
weights_out =  np.array((2 * np.random.random(  hidden_num + 1 ) - 1 ) * 0.25)


def predict(x):
    global input_num
    global hidden_num
    global weights_hidden 
    global weights_out 

    x = np.append(x.astype(float), 1.0)     # input to the hidden layer: features + bias term
    a = x.dot(weights_hidden)            # activations of the hidden layer
    z = np.tanh(a)                          # output of the hidden layer
    q = logistic(z.dot(weights_out))     # input to the output (decision) layer
    if q >= 0.5:
        return 1
    return 0



def train(X, t, learning_rate=0.2, epochs=50):
    global input_num
    global hidden_num
    global weights_hidden 
    global weights_out 

    weights_hidden =  np.array((2 * np.random.random( (input_num + 1, hidden_num + 1) ) - 1 ) * 0.25)
    weights_out =  np.array((2 * np.random.random(  hidden_num + 1 ) - 1 ) * 0.25)

    for epoch in range(epochs):
        gradient_out = 0.0                       # gradients for output and hidden layers
        gradient_hidden = []

        for i in range(X.shape[0]):            
        # forward propagation
            x = np.array(X[i])                      
            x = np.append(x.astype(float), 1.0)  # input to the hidden layer: features + bias term
            a = x.dot(weights_hidden)            # activations of the hidden layer
            z = np.tanh(a)                       # output of the hidden layer
            q = z.dot(weights_out)               # activations to the output (decision) layer
            y = logistic(q)                      # output of the decision layer

        # backpropagation
            delta_hidden_s = []                  # delta and gradient for a single training sample (hidden layer)
            gradient_hidden_s = []

            delta_out_s = t[i] - y               # delta and gradient for a single training sample (output layer)
            gradient_out_s = delta_out_s * z

            for j in range(hidden_num + 1):                 
                delta_hidden_s.append(tanh_deriv(a[j]) * (weights_out[j] * delta_out_s))
                gradient_hidden_s.append(delta_hidden_s[j] * x)

            gradient_out = gradient_out + gradient_out_s             # accumulate gradients over training set
            gradient_hidden = gradient_hidden + gradient_hidden_s

    print "\n#", epoch, "Gradient out: ",gradient_out, 
        print "\n     Weights  out: ", weights_out

        # Now updating weights
        weights_out = weights_out - learning_rate * gradient_out

        for j in range(hidden_num + 1):
            weights_hidden.T[j] = weights_hidden.T[j] - learning_rate * gradient_hidden[j]



train(X, t, 0.2, 50)

And the evolution of gradient and weights for the output unit over epoch:

  0 Gradient out:  [ 11.07640724  -7.20309009   0.24776626] 
    Weights  out:  [-0.15397237  0.22232593  0.03162811]

  1 Gradient out:  [ 23.68791197 -19.6688382   -1.75324703] 
    Weights  out:  [-2.36925382  1.66294395 -0.01792515]

  2 Gradient out:  [ 79.08612305 -65.76066015  -7.70115262] 
    Weights  out:  [-7.10683621  5.59671159  0.33272426]

  3 Gradient out:  [ 99.59798656 -93.90973727 -21.45674943] 
    Weights  out:  [-22.92406082  18.74884362   1.87295478]

...

  49 Gradient out:  [ 107.89975864 -105.8654327  -104.69591522] 
     Weights  out:  [-1003.67912726   976.87213404   922.38862049]

I tried different datasets, various number of hidden units. I tried to update weights with addition instead of substraction... Nothing helps...

Could somebody tell me what might be wrong? Thanks in advance

like image 590
Dmytro Prylipko Avatar asked Nov 02 '22 17:11

Dmytro Prylipko


1 Answers

I do not believe that you should use sum of squares error function for binary classification. Instead you should use the cross entropy error function, which is basically a likelihood function. This way the error will get much more expensive the longer your prediction is from the correct answer. Please read the section about "Network Training" pp. 235 in "Pattern Recognition and Machine Learning" by Christopher Bishop, this will give you a proper overview on how to do supervised learning on a FFNN.

The bias units are extremely important, thus they make it possible for the transfer funct. to shift along the x-curve. The weights will change the steepness of the transfer funct. curve. Note this difference between biases and weights, as it will give a good understanding on why they both need to be present in a FFNN.

like image 112
Maal Avatar answered Jan 04 '23 15:01

Maal