I have implemented the Perceptron Learning Algorithm in Python as below. Even with 500,000 iterations, it still won't converge.
I have a training data matrix X with target vector Y, and a weight vector w to be optimized.
My update rule is:
while(exist_mistakes):
# dot product to check for mistakes
output = [np.sign(np.dot(X[i], w)) == Y[i] for i in range(0, len(X))]
# find index of mistake. (choose randomly in order to avoid repeating same index.)
n = random.randint(0, len(X)-1)
while(output[n]): # if output is true here, choose again
n = random.randint(0, len(X)-1)
# once we have found a mistake, update
w = w + Y[n]*X[n]
Is this wrong? Or why is it not converging even after 500,000 iterations?
Perceptron Convergence Theorem: For any finite set of linearly separable labeled examples, the Perceptron Learning Algorithm will halt after a finite number of iterations. In other words, after a finite number of iterations, the algorithm yields a vector w that classifies perfectly all the examples.
If the training set is linearly separable, then the perceptron is guaranteed to converge. Furthermore, there is an upper bound on the number of times the perceptron will adjust its weights during the training.
Perceptron networks have several limitations. First, the output values of a perceptron can take on only one of two values (0 or 1) because of the hard-limit transfer function. Second, perceptrons can only classify linearly separable sets of vectors.
The Perceptron always converges to the best linear separator for a given dataset. The convergence criteria for Perceptron depends on the initial value of the weight vector. If the dataset is not linearly separable, the Perceptron algorithm does not converge and keeps cycling between some sets of weights.
Perceptrons by Minsky and Papert (in)famously demonstrated in 1969 that the perceptron learning algorithm is not guaranteed to converge for datasets that are not linearly separable.
If you're sure that your dataset is linearly separable, you might try adding a bias to each of your data vectors, as described by the question: Perceptron learning algorithm not converging to 0 -- adding a bias can help model decision boundaries that do not pass through the origin.
Alternatively, if you'd like to use a variant of the perceptron learning algorithm that is guaranteed to converge to a margin of specified width, even for datasets that are not linearly separable, have a look at the Averaged Perceptron -- PDF. The averaged perceptron is an approximation to the voted perceptron, which was introduced (as far as I know) in a nice paper by Freund and Schapire, "Large Margin Classification Using the Perceptron Algorithm" -- PDF.
Using an averaged perceptron, you make a copy of the parameter vector after each presentation of a training example during training. The final classifier uses the mean of all parameter vectors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With