Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement callable distance metric in scikit-learn?

I'm using the clustering module in python's scikit learn, and I'd like to use a Normalized Euclidean Distance. There is no built-in distance for this (that i know of) Here's a list.

So, I want to implement my own Normalized Euclidean Distance using a callable. The function is part of my distance module and is called distance.normalized_euclidean_distance. It takes three inputs: X,Y, and SD.

However, Normalized Euclidean Distance requires standard deviation for the population sample. But, the pairwise distance in scipy only allows two inputs: X and Y.

How do I allow it to take an additional argument?

I tried putting it in as a **kwarg, but that didn't seem to work:

cluster = DBSCAN(eps=1.0, min_samples=1,metric = distance.normalized_euclidean, SD = stdv)

where distance.normalized_euclidean is the function that I wrote that takes in two arrays, X and Y and computes the normalized euclidean distance between them.

...but this throws an error:

TypeError: __init__() got an unexpected keyword argument 'SD'

What is the way to use additional keyword arguments?

Here it says Any further parameters are passed directly to the distance function., which made me think that this would be acceptable.

like image 436
makansij Avatar asked Sep 27 '22 02:09

makansij


1 Answers

You can use a lambda function as metric which takes two input arrays:

cluster = DBSCAN(eps=1.0, min_samples=1,metric=lambda X, Y: distance.normalized_euclidean(X, Y, SD=stdv))
like image 99
yangjie Avatar answered Oct 24 '22 10:10

yangjie