Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why metric='precomputed' doesn't work in sk-learn's k-nearest neighbours?

I'm trying to fit a precomputed kernel matrix when using http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html, it is apparently possible since the metric 'precomputed' exists. I allows you to pass a n_samples*n_samples kernel matrix to fit method.

When using it, here's what I get :

ValueError: Metric 'precomputed' not valid for algorithm 'auto'

I don't understand how using algorithm 'auto' to find nearest neighbours is not compatible with the fact that I'm using a precomputed kernel matrix.

EDIT :

Unfortunately my question didn't get any attention. I've looked into the source code more deeply and it seems that there is a bug since when you pass metric=precomputed, since the code should allow you to choose algorithm=auto. Instead of that, when running, the code bumps into the valueError I mentioned, and I don't think the author wanted his code to behave that way. I have no idea how to change the source code to behave properly.

Also I want to add to the question that on a more theoritical point of view, it is completely justified to be able to use a kernel matrix (aka gram matrix) to use fit method of kNN. You can derive the distance matrix from the gram matrix and then when you want to predict a new data you just have to find the k nearest neighbors and label the new data with the most present label in the k nearest neighbors.

I really think this question should get an answer. It is properly asked, I want something really precise and I know that someone with a deeper understanding of Python and scikit learn library should be able to answer it. Maybe I'm missing something obvious but I also think it should help anyone trying to use kNN with a precomputed kernel matrix (which is not an isolated case).

like image 394
Syzygy Avatar asked Jul 18 '16 06:07

Syzygy


1 Answers

I guess this is way too late a reply but if you were still wondering. 'Auto' won't work because KDTree doesn't accept a user-defined or precomputed metric. Only Ball Tree will work. If you specifically set algorithm to 'Ball Tree' it should work just fine. Hope this helps!

like image 172
user8226519 Avatar answered Jan 03 '23 22:01

user8226519