Sklearn-KNN allows one to set weights (e.g., uniform, distance) when calculating the mean x nearest neighbours.
Instead of predicting with the mean, is it possible to predict with the median (perhaps with a user-defined function)?
The KNN algorithm uses 'feature similarity' to predict the values of any new data points. This means that the new point is assigned a value based on how closely it resembles the points in the training set.
The key to improve the algorithm is to add a preprocessing stage to make the final algorithm run with more efficient data and then improve the effect of classification. The experimental results show that the improved KNN algorithm improves the accuracy and efficiency of classification.
NearestNeighbors implements unsupervised nearest neighbors learning. It acts as a uniform interface to three different nearest neighbors algorithms: BallTree , KDTree , and a brute-force algorithm based on routines in sklearn.
There is no built-in parameter to adjust the weighting to use the median rather than the mean (you can see in the source that the mean is hard-coded). But because scikit-learn estimators are just Python classes, you can subclass KNeighborsRegressor
and override the predict
method to do whatever you want.
Here's a quick example, where I've copied and pasted the original predict()
method and modified the relevant piece:
from sklearn.neighbors.regression import KNeighborsRegressor, check_array, _get_weights
class MedianKNNRegressor(KNeighborsRegressor):
def predict(self, X):
X = check_array(X, accept_sparse='csr')
neigh_dist, neigh_ind = self.kneighbors(X)
weights = _get_weights(neigh_dist, self.weights)
_y = self._y
if _y.ndim == 1:
_y = _y.reshape((-1, 1))
######## Begin modification
if weights is None:
y_pred = np.median(_y[neigh_ind], axis=1)
else:
# y_pred = weighted_median(_y[neigh_ind], weights, axis=1)
raise NotImplementedError("weighted median")
######### End modification
if self._y.ndim == 1:
y_pred = y_pred.ravel()
return y_pred
X = np.random.rand(100, 1)
y = 20 * X.ravel() + np.random.rand(100)
clf = MedianKNNRegressor().fit(X, y)
print(clf.predict(X[:5]))
# [ 2.38172861 13.3871126 9.6737255 2.77561858 17.07392584]
I've left out the weighted version, because I don't know of a simple way to compute a weighted median with numpy/scipy, but it would be straightforward to add in once that function is available.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With