scikit-learn kmeans custom distance [duplicate]

Tags:

scikit-learn

I looking to use the kmeans algorithm to cluster some data, but I would like to use a custom distance function. Is there any way I can change the distance function that is used by scikit-learn?

I would also settle for a different framework / module that would allow exchanging the distance function and can calculate the kmeans in parallel (I would like to speed up the calculation, which is a nice feature from scikit-learn)

Any suggestions?

377

asked Jun 29 '15 23:06

Nils Ziehn

1 Answers

You could try spectral clustering algorithm which allows you to input your own distance matrix (calculated as you like).

Its performance has nothing to envy to K-means on convex boundaries, but does also the job on non-convex problems (detects connectivity). See more here.

The good news is that spectral clustering is also implemented in scikit-learn.

Hope it helps.

185

answered Oct 01 '22 00:10

gowithefloww

Related questions
                            
                                Django 1.8 Migrations - "NoneType" object has no attribute "_meta"
                            
                                Wandering star - codeabbey task
                            
                                mmap file inquiry for a blank file in Python
                            
                                Is it possible to use Django's SafeExceptionReporterFilter with something else than the AdminEmailHandler?
                            
                                How to fix a regex that attemps to catch some word and id?
                            
                                TypeError: histogram() got an unexpected keyword argument 'new'
                            
                                Django Rest Framework - How do I limit results returned with Geolocation?
                            
                                Python subprocess echo a unicode literal
                            
                                Offline Installation of python & pip
                            
                                out of memory error when reading csv file in chunk
                            
                                How to update the value of a row of a WPF DataGrid from IronPython?
                            
                                supplying variables to class dynamically
                            
                                join function of a numpy array composed of string
                            
                                TypeError: 'int' object has no attribute '__getitem__' error because of possible erratum in book
                            
                                Desktop Launcher for Python Script Starts Program in Wrong Path
                            
                                Get start date and end date of the week, given week number and year
                            
                                Calculate energy for each frequency band around frequency F of interest in Python
                            
                                Behave test runner has no colored output on Jenkins
                            
                                Starting bottle web server through systemd?
                            
                                Python: Multiple try except blocks in one?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With