Sklearn kNN usage with a user defined metric

Tags:

Currently I'm doing a project which may require using a kNN algorithm to find the top k nearest neighbors for a given point, say P. im using python, sklearn package to do the job, but our predefined metric is not one of those default metrics. so I have to use the user defined metric, from the documents of sklearn, which can be find here and here.

It seems that the latest version of sklearn kNN support the user defined metric, but i cant find how to use it:

import sklearn from sklearn.neighbors import NearestNeighbors import numpy as np from sklearn.neighbors import DistanceMetric from sklearn.neighbors.ball_tree import BallTree BallTree.valid_metrics

say i have defined a metric called mydist=max(x-y), then use DistanceMetric.get_metric to make it a DistanceMetric object:

dt=DistanceMetric.get_metric('pyfunc',func=mydist)

from the document, the line should looks like this

nbrs = NearestNeighbors(n_neighbors=4, algorithm='auto',metric='pyfunc').fit(A) distances, indices = nbrs.kneighbors(A)

but where can i put the dt in? Thanks

967

asked Jan 10 '14 19:01

user2926523

2 Answers

You pass a metric as metric param, and additional metric arguments as keyword paramethers to NN constructor:

>>> def mydist(x, y): ...     return np.sum((x-y)**2) ... >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])  >>> nbrs = NearestNeighbors(n_neighbors=4, algorithm='ball_tree', ...            metric='pyfunc', func=mydist) >>> nbrs.fit(X) NearestNeighbors(algorithm='ball_tree', leaf_size=30, metric='pyfunc',          n_neighbors=4, radius=1.0) >>> nbrs.kneighbors(X) (array([[  0.,   1.,   5.,   8.],        [  0.,   1.,   2.,  13.],        [  0.,   2.,   5.,  25.],        [  0.,   1.,   5.,   8.],        [  0.,   1.,   2.,  13.],        [  0.,   2.,   5.,  25.]]), array([[0, 1, 2, 3],        [1, 0, 2, 3],        [2, 1, 0, 3],        [3, 4, 5, 0],        [4, 3, 5, 0],        [5, 4, 3, 0]]))

198

answered Sep 28 '22 06:09

alko

A small addition to the previous answer. How to use a user defined metric that takes additional arguments.

>>> def mydist(x, y, **kwargs): ...     return np.sum((x-y)**kwargs["metric_params"]["power"]) ... >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) >>> Y = np.array([-1, -1, -2, 1, 1, 2]) >>> nbrs = KNeighborsClassifier(n_neighbors=4, algorithm='ball_tree', ...            metric=mydist, metric_params={"power": 2}) >>> nbrs.fit(X, Y) KNeighborsClassifier(algorithm='ball_tree', leaf_size=30,                                                                                                                                                                  metric=<function mydist at 0x7fd259c9cf50>, n_neighbors=4, p=2,        weights='uniform') >>> nbrs.kneighbors(X) (array([[  0.,   1.,   5.,   8.],        [  0.,   1.,   2.,  13.],        [  0.,   2.,   5.,  25.],        [  0.,   1.,   5.,   8.],        [  0.,   1.,   2.,  13.],        [  0.,   2.,   5.,  25.]]),  array([[0, 1, 2, 3],        [1, 0, 2, 3],        [2, 1, 0, 3],        [3, 4, 5, 0],        [4, 3, 5, 0],        [5, 4, 3, 0]]))

answered Sep 28 '22 07:09

Mahmoud

Related questions
                            
                                How do I clone a conda environment from one python release to another?
                            
                                How to convert defaultdict of defaultdicts [of defaultdicts] to dict of dicts [of dicts]?
                            
                                Firefox Build does not work with Selenium
                            
                                How to substitute multiple symbols in an expression in sympy?
                            
                                Differences between null and NaN in spark? How to deal with it?
                            
                                pandas timestamp series to string?
                            
                                Reordering matrix elements to reflect column and row clustering in naiive python
                            
                                How can I exclude South migrations from coverage reports using coverage.py
                            
                                Can I redirect unicode output from the console directly into a file?
                            
                                How to make FileField in django optional?
                            
                                replace zeroes in numpy array with the median value
                            
                                How to load all entries in an infinite scroll at once to parse the HTML in python
                            
                                Django Rest Framework custom authentication
                            
                                Pytorch: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead
                            
                                What is Adaptive average pooling and How does it work?
                            
                                Concatenate generator and item
                            
                                paramiko no existing session exception
                            
                                How to indent the contents of a multi-line string?
                            
                                Python: Array v. List [duplicate]
                            
                                How to set a files owner in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With