Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

KNeighborsClassifier with cross-validation returns perfect accuracy when k=1

I'm training a KNN classifier using scikit-learn's KNeighborsClassifier with cross validation:

k=1
param_space = {'n_neighbors': [k]}
model = KNeighborsClassifier(n_neighbors=k, metric='euclidean')
search = GridSearchCV(model, param_space, cv=cv, verbose=10, n_jobs=8)
search.fit(X_df, y_df)
preds = search.best_estimator_.predict(X_df)

when k=1 and with any cv value (lets say cv=4), I'm getting perfect score

from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(y_df, preds).ravel()
f1 = tp / (tp + 0.5 * (fp + fn))
# f1 is perfect 1

It is important to say that I use this method on multiple datasets, and every time k=1 the score is a perfect 1. I tried doing it on random data and still got f1=1.

Is there any known bug with KNeighborsClassifier when k=1? Maybe I'm missing something else? Thanks in advance.

like image 270
Netanel Avatar asked Oct 27 '22 10:10

Netanel


1 Answers

Your predictions are from the best_estimator_, which is a copy of the estimator with the optimal hyperparameters (according to the cross-validation scores) refitted to the entire training set. So the confusion matrix you generate is really a training score, and for 1-neighbors that's trivially perfect (the nearest neighbor of a point is itself).

like image 149
Ben Reiniger Avatar answered Oct 31 '22 17:10

Ben Reiniger