KNeighborsClassifier with cross-validation returns perfect accuracy when k=1

Question

I'm training a KNN classifier using scikit-learn's KNeighborsClassifier with cross validation:

k=1
param_space = {'n_neighbors': [k]}
model = KNeighborsClassifier(n_neighbors=k, metric='euclidean')
search = GridSearchCV(model, param_space, cv=cv, verbose=10, n_jobs=8)
search.fit(X_df, y_df)
preds = search.best_estimator_.predict(X_df)

when k=1 and with any cv value (lets say cv=4), I'm getting perfect score

from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(y_df, preds).ravel()
f1 = tp / (tp + 0.5 * (fp + fn))
# f1 is perfect 1

It is important to say that I use this method on multiple datasets, and every time k=1 the score is a perfect 1. I tried doing it on random data and still got f1=1.

Is there any known bug with KNeighborsClassifier when k=1? Maybe I'm missing something else? Thanks in advance.

Ben Reiniger · Accepted Answer

Your predictions are from the best_estimator_, which is a copy of the estimator with the optimal hyperparameters (according to the cross-validation scores) refitted to the entire training set. So the confusion matrix you generate is really a training score, and for 1-neighbors that's trivially perfect (the nearest neighbor of a point is itself).

KNeighborsClassifier with cross-validation returns perfect accuracy when k=1

Tags:

python

scikit-learn

Netanel

1 Answers

Ben Reiniger

Recent Activity

Donate For Us

KNeighborsClassifier with cross-validation returns perfect accuracy when k=1

Tags:

python

scikit-learn

Netanel

1 Answers

Ben Reiniger

Related questions

Recent Activity

Donate For Us