Setting feature weights for KNN

Question

I am working with sklearn's implementation of KNN. While my input data has about 20 features, I believe some of the features are more important than others. Is there a way to:

set the feature weights for each feature when "training" the KNN learner.
learn what the optimal weight values are with or without pre-processing the data.

On a related note, I understand generally KNN does not require training but since sklearn implements it using KDTrees, the tree must be generated from the training data. However, this sounds like its turning KNN into a binary tree problem. Is that the case?

Thanks.

Michael Simbirsky · Accepted Answer

kNN is simply based on a distance function. When you say "feature two is more important than others" it usually means difference in feature two is worth, say, 10x difference in other coords. Simple way to achive this is by multiplying coord #2 by its weight. So you put into the tree not the original coords but coords multiplied by their respective weights.

In case your features are combinations of the coords, you might need to apply appropriate matrix transform on your coords before applying weights, see PCA (principal component analysis). PCA is likely to help you with question 2.

Andreas Mueller · Answer

The answer to question to is called "metric learning" and currently not implemented in Scikit-learn. Using the popular Mahalanobis distance amounts to rescaling the data using StandardScaler. Ideally you would want your metric to take into account the labels.

Setting feature weights for KNN

Tags:

scikit-learn

knn

user2976570

2 Answers

Michael Simbirsky

Andreas Mueller

Recent Activity

Donate For Us

Setting feature weights for KNN

Tags:

scikit-learn

knn

user2976570

2 Answers

Michael Simbirsky

Andreas Mueller

Related questions

Recent Activity

Donate For Us