I'm trying to implement my own kNN classifier. I've managed to implement something, but it's incredibly slow...
def euclidean_distance(X_train, X_test):
"""
Create list of all euclidean distances between the given
feature vector and all other feature vectors in the training set
"""
return [np.linalg.norm(X - X_test) for X in X_train]
def k_nearest(X, Y, k):
"""
Get the indices of the nearest feature vectors and return a
list of their classes
"""
idx = np.argpartition(X, k)
return np.take(Y, idx[:k])
def predict(X_test):
"""
For each feature vector get its predicted class
"""
distance_list = [euclidean_distance(X_train, X) for X in X_test]
return np.array([Counter(k_nearest(distances, Y_train, k)).most_common()[0][0] for distances in distance_list])
where (for example)
X = [[ 1.96701284 6.05526865]
[ 1.43021202 9.17058291]]
Y = [ 1. 0.]
Obviously it would be much faster if I didn't use any for loops, but I don't know how to make it work without them. Is there a way I can do this without using for loops / list comprehensions?
Here's a vectorized approach -
from scipy.spatial.distance import cdist
from scipy.stats import mode
dists = cdist(X_train, X)
idx = np.argpartition(dists, k, axis=0)[:k]
nearest_dists = np.take(Y_train, idx)
out = mode(nearest_dists,axis=0)[0]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With