I'm training a KNN classifier using scikit-learn's KNeighborsClassifier with cross validation:
k=1
param_space = {'n_neighbors': [k]}
model = KNeighborsClassifier(n_neighbors=k, metric='euclidean')
search = GridSearchCV(model, param_space, cv=cv, verbose=10, n_jobs=8)
search.fit(X_df, y_df)
preds = search.best_estimator_.predict(X_df)
when k=1
and with any cv value (lets say cv=4
), I'm getting perfect score
from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(y_df, preds).ravel()
f1 = tp / (tp + 0.5 * (fp + fn))
# f1 is perfect 1
It is important to say that I use this method on multiple datasets, and every time k=1
the score is a perfect 1. I tried doing it on random data and still got f1=1
.
Is there any known bug with KNeighborsClassifier
when k=1
? Maybe I'm missing something else?
Thanks in advance.
Your predictions are from the best_estimator_
, which is a copy of the estimator with the optimal hyperparameters (according to the cross-validation scores) refitted to the entire training set. So the confusion matrix you generate is really a training score, and for 1-neighbors that's trivially perfect (the nearest neighbor of a point is itself).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With