Currently I'm doing a project which may require using a kNN algorithm to find the top k nearest neighbors for a given point, say P. im using python, sklearn package to do the job, but our predefined metric is not one of those default metrics. so I have to use the user defined metric, from the documents of sklearn, which can be find here and here.
It seems that the latest version of sklearn kNN support the user defined metric, but i cant find how to use it:
import sklearn from sklearn.neighbors import NearestNeighbors import numpy as np from sklearn.neighbors import DistanceMetric from sklearn.neighbors.ball_tree import BallTree BallTree.valid_metrics
say i have defined a metric called mydist=max(x-y), then use DistanceMetric.get_metric to make it a DistanceMetric object:
dt=DistanceMetric.get_metric('pyfunc',func=mydist)
from the document, the line should looks like this
nbrs = NearestNeighbors(n_neighbors=4, algorithm='auto',metric='pyfunc').fit(A) distances, indices = nbrs.kneighbors(A)
but where can i put the dt
in? Thanks
KNN (K-Nearest Neighbor) is a simple supervised classification algorithm we can use to assign a class to new data point. It can be used for regression as well, KNN does not make any assumptions on the data distribution, hence it is non-parametric.
sklearn. neighbors provides functionality for unsupervised and supervised neighbors-based learning methods. Unsupervised nearest neighbors is the foundation of many other learning methods, notably manifold learning and spectral clustering.
First, import the KNeighborsClassifier module and create KNN classifier object by passing argument number of neighbors in KNeighborsClassifier() function. Then, fit your model on the train set using fit() and perform prediction on the test set using predict().
KNN is just a slow algorithm, it's slower for you because computing distances between images is hard at scale, and it's slower for you because the problem is large enough that your cache can't really be used effectively.
You pass a metric as metric
param, and additional metric arguments as keyword paramethers to NN constructor:
>>> def mydist(x, y): ... return np.sum((x-y)**2) ... >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) >>> nbrs = NearestNeighbors(n_neighbors=4, algorithm='ball_tree', ... metric='pyfunc', func=mydist) >>> nbrs.fit(X) NearestNeighbors(algorithm='ball_tree', leaf_size=30, metric='pyfunc', n_neighbors=4, radius=1.0) >>> nbrs.kneighbors(X) (array([[ 0., 1., 5., 8.], [ 0., 1., 2., 13.], [ 0., 2., 5., 25.], [ 0., 1., 5., 8.], [ 0., 1., 2., 13.], [ 0., 2., 5., 25.]]), array([[0, 1, 2, 3], [1, 0, 2, 3], [2, 1, 0, 3], [3, 4, 5, 0], [4, 3, 5, 0], [5, 4, 3, 0]]))
A small addition to the previous answer. How to use a user defined metric that takes additional arguments.
>>> def mydist(x, y, **kwargs): ... return np.sum((x-y)**kwargs["metric_params"]["power"]) ... >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) >>> Y = np.array([-1, -1, -2, 1, 1, 2]) >>> nbrs = KNeighborsClassifier(n_neighbors=4, algorithm='ball_tree', ... metric=mydist, metric_params={"power": 2}) >>> nbrs.fit(X, Y) KNeighborsClassifier(algorithm='ball_tree', leaf_size=30, metric=<function mydist at 0x7fd259c9cf50>, n_neighbors=4, p=2, weights='uniform') >>> nbrs.kneighbors(X) (array([[ 0., 1., 5., 8.], [ 0., 1., 2., 13.], [ 0., 2., 5., 25.], [ 0., 1., 5., 8.], [ 0., 1., 2., 13.], [ 0., 2., 5., 25.]]), array([[0, 1, 2, 3], [1, 0, 2, 3], [2, 1, 0, 3], [3, 4, 5, 0], [4, 3, 5, 0], [5, 4, 3, 0]]))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With