Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Predict with sklearn-KNN using median (instead of mean)

Sklearn-KNN allows one to set weights (e.g., uniform, distance) when calculating the mean x nearest neighbours.

Instead of predicting with the mean, is it possible to predict with the median (perhaps with a user-defined function)?

like image 923
Eugene Yan Avatar asked Nov 15 '15 04:11

Eugene Yan


People also ask

How does the predicted values gets computed using KNN?

The KNN algorithm uses 'feature similarity' to predict the values of any new data points. This means that the new point is assigned a value based on how closely it resembles the points in the training set.

How can I improve my KNN algorithm?

The key to improve the algorithm is to add a preprocessing stage to make the final algorithm run with more efficient data and then improve the effect of classification. The experimental results show that the improved KNN algorithm improves the accuracy and efficiency of classification.

Which method is used to find the nearest neighbors distance in sklearn?

NearestNeighbors implements unsupervised nearest neighbors learning. It acts as a uniform interface to three different nearest neighbors algorithms: BallTree , KDTree , and a brute-force algorithm based on routines in sklearn.


1 Answers

There is no built-in parameter to adjust the weighting to use the median rather than the mean (you can see in the source that the mean is hard-coded). But because scikit-learn estimators are just Python classes, you can subclass KNeighborsRegressor and override the predict method to do whatever you want.

Here's a quick example, where I've copied and pasted the original predict() method and modified the relevant piece:

from sklearn.neighbors.regression import KNeighborsRegressor, check_array, _get_weights

class MedianKNNRegressor(KNeighborsRegressor):
    def predict(self, X):
        X = check_array(X, accept_sparse='csr')

        neigh_dist, neigh_ind = self.kneighbors(X)

        weights = _get_weights(neigh_dist, self.weights)

        _y = self._y
        if _y.ndim == 1:
            _y = _y.reshape((-1, 1))

        ######## Begin modification
        if weights is None:
            y_pred = np.median(_y[neigh_ind], axis=1)
        else:
            # y_pred = weighted_median(_y[neigh_ind], weights, axis=1)
            raise NotImplementedError("weighted median")
        ######### End modification

        if self._y.ndim == 1:
            y_pred = y_pred.ravel()

        return y_pred    

X = np.random.rand(100, 1)
y = 20 * X.ravel() + np.random.rand(100)
clf = MedianKNNRegressor().fit(X, y)
print(clf.predict(X[:5]))
# [  2.38172861  13.3871126    9.6737255    2.77561858  17.07392584]

I've left out the weighted version, because I don't know of a simple way to compute a weighted median with numpy/scipy, but it would be straightforward to add in once that function is available.

like image 106
jakevdp Avatar answered Sep 30 '22 19:09

jakevdp