Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting feature weights for KNN

I am working with sklearn's implementation of KNN. While my input data has about 20 features, I believe some of the features are more important than others. Is there a way to:

  1. set the feature weights for each feature when "training" the KNN learner.
  2. learn what the optimal weight values are with or without pre-processing the data.

On a related note, I understand generally KNN does not require training but since sklearn implements it using KDTrees, the tree must be generated from the training data. However, this sounds like its turning KNN into a binary tree problem. Is that the case?

Thanks.

like image 613
user2976570 Avatar asked Nov 10 '13 17:11

user2976570


2 Answers

kNN is simply based on a distance function. When you say "feature two is more important than others" it usually means difference in feature two is worth, say, 10x difference in other coords. Simple way to achive this is by multiplying coord #2 by its weight. So you put into the tree not the original coords but coords multiplied by their respective weights.

In case your features are combinations of the coords, you might need to apply appropriate matrix transform on your coords before applying weights, see PCA (principal component analysis). PCA is likely to help you with question 2.

like image 158
Michael Simbirsky Avatar answered Nov 13 '22 09:11

Michael Simbirsky


The answer to question to is called "metric learning" and currently not implemented in Scikit-learn. Using the popular Mahalanobis distance amounts to rescaling the data using StandardScaler. Ideally you would want your metric to take into account the labels.

like image 28
Andreas Mueller Avatar answered Nov 13 '22 10:11

Andreas Mueller