How to find out weights of attributes in K-nearest neighbors algorithm?

Question

I have such code in python with dataset of house prices:

from sklearn.datasets import load_boston
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import scale

boston = load_boston()
y = boston.target
X = scale(boston.data)
knn = KNeighborsRegressor(n_neighbors=5, weights='distance', metric='minkowski', p=1)
knn.fit(X, y)

And now I can predict target attribute, in this case it's price:

knn.predict([-0.41771335,  0.28482986, -1.2879095 , ..., -1.45900038,
     0.44105193, -1.0755623 ])

As I understand this algorithm should find weights for each attribute to make a distance function. Where can I find computed weights of each attribute? I wonder what attribute has strongest correlation with house price.

bjarkemoensted · Accepted Answer

You actually specify the weights via the metric argument.

First off, your question details are slightly incorrect. The algorithm doesn't find a distance function - you supply it with a metric in which to compute distances, and a function to compute weights as a function of those distances. You are using the default distance metric which, according to the docs is just the good old Euclidean distance.

Weights are computed as the inverse of distance (also written in the docs), so you can manually find the k neighbors of a given point and compute their weights using the build in kneighbors method to find neighbors:

test = [[np.random.uniform(-1, 1) for _ in xrange(len(X[0]))]]

neighbors, distances = knn.kneighbors(test)
for d in distances:
    weight = 1.0/d
    print weight

The problem is that all features enter into the calculation of d with equal weight because you've specified a Euclidean metric, i.e. d is the square root of

1*(x1_neighbor - x1_test)^2 + 1*(x2_neighbor - x2_test)^2 + ...

This is because the Minkowsky metric is just a matrix with ones along the diagonal. If you want different weights, you can specify an alternate metric. However, if you just want a quick and dirty way of telling how important the various features are, a typical way of estimating the importance of feature i is to randomly permute all values of feature i and see how much it hurts the performance of the regressor. You can read more about that here.

How to find out weights of attributes in K-nearest neighbors algorithm?

Tags:

python

algorithm

machine-learning

scikit-learn

Timrael

1 Answers

bjarkemoensted

Recent Activity

Donate For Us

How to find out weights of attributes in K-nearest neighbors algorithm?

Tags:

python

algorithm

machine-learning

scikit-learn

Timrael

1 Answers

bjarkemoensted

Related questions

Recent Activity

Donate For Us