DBSCAN with custom metric

Tags:

I have the following given:

a dataset in the range of thousands
a way of computing the similarity, but the datapoints themselves I cannot plot them in euclidian space

I know that DBSCAN should support custom distance metric but I dont know how to use it.

say I have a function

def similarity(x,y):
    return  similarity ...

and I have a list of data that can be passed pairwise into that function, how do I specify this when using the DBSCAN implementation of scikit-learn ?

Ideally what I want to do is to get a list of the clusters but I cant figure out how to get started in the first place.

There is a lot of terminology that still confuses me:

http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html

How do I pass a feature array and what is it ? How do I fit this implementation to my needs ? How will I be able to get my "sublists" from this algorithm ?

238

asked Feb 13 '18 13:02

zython

1 Answers

A "feature array" is simply an array of the features of a datapoint in your dataset.

metric is the parameter you're looking for. It can be a string (the name of a builtin metric), or a callable. Your similarity function is a callable. This isn't well described in the documentation, but a metric has to do just that, take two datapoints as parameters, and return a number.

def similarity(x, y):
    return ...

reduced_dataset = sklearn.cluster.DBSCAN(metric=similarity).fit(dataset)

119

answered Sep 27 '22 20:09

j4nw

Related questions
                            
                                Is it good practice to yield from within a context manager?
                            
                                Cookie authentication with Python requests
                            
                                Does PyCharm have autocomplete file path?
                            
                                how to convert a list into a pandas dataframe
                            
                                When reading huge HDF5 file with "pandas.read_hdf() ", why do I still get MemoryError even though I read in chunks by specifying chunksize?
                            
                                Python Invoke - Can't find any collection named 'tasks'!
                            
                                Django model subclassing approaches
                            
                                Changing time components of pandas datetime64 column
                            
                                How to create charts with Plotly on Django?
                            
                                Folium map not displaying
                            
                                Why can't pdb access a variable containing an exception?
                            
                                Running Flask with Gunicorn raises TypeError: index() takes 0 positional arguments but 2 were given
                            
                                Binary to String/Text in Python
                            
                                Heroku Scheduler With Python Script
                            
                                Weird repeated sequence printed to console when installing packages through conda
                            
                                Convert Pandas Dataframe to Float with commas and negative numbers
                            
                                How do I perform an UPDATE of existing rows of a db table using a Pandas DataFrame?
                            
                                How to mock a method return value of a class
                            
                                Python logger per function or per module
                            
                                Is 3-space indentation required in reST?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

DBSCAN with custom metric

Tags:

python

cluster-analysis

scikit-learn

zython

People also ask

1 Answers

j4nw

Recent Activity

Donate For Us