What are noisy samples in Scikit's DBSCAN clustering algorithm?

Tags:

If I apply Scikit's DBSCAN (http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) on a similarity matrix, I get a series of labels back. Some of these labels are -1. The documentation calls them noisy samples.

What are these? Do they all belong to a single cluster, or do they each belong to their own cluster since they're noisy?

Thank you

802

asked Jul 25 '17 20:07

Auxiliary

1 Answers

These are not exactly part of a cluster. They are simply points that do not belong to any clusters and can be "ignored" to some extent.

Remember, DBSCAN stands for "Density-Based Spatial Clustering of Applications with Noise." DBSCAN checks to make sure a point has enough neighbors within a specified range to classify the points into the clusters.

But what happens to the points that do not meet the criteria for falling into any of the main clusters? What if a point does not have enough neighbors within the specified radius to be considered part of a cluster? These are the points that are given the cluster label of -1 and are considered noise.

So what?

Well, if you are analyzing data points and you are only interested in the general clusters, you lower the size of the data and cut out the noise. Or, if you are using cluster analysis to classify data, in some cases it is possible to discard the noise as outliers.

In anomaly detection, points that do not fit into any category are also significant, as they can represent a problem or rare event.

answered Oct 11 '22 02:10

victor

Related questions
                            
                                pytest: environment variable to specify pytest.ini location
                            
                                Cannot find a file in my tempfile.TemporaryDirectory() for Python3
                            
                                Collecting results from python coroutines before loop finishes
                            
                                One line solution for editing a numpy array of counts? (python)
                            
                                Fetch data from form and display in template
                            
                                Python: How to update a value in Google BigQuery in less than 40 seconds?
                            
                                Python .loc confusion
                            
                                Maxvalue in cv2.minMaxLoc()?
                            
                                Handle 1000 concurrent requests for Flask/Gunicorn web service
                            
                                Iterating over all notes in Music21
                            
                                Fill a matrix from a matrix of indices
                            
                                Python define function inside if block or vice versa
                            
                                Python: interpolating in a triangular mesh
                            
                                Formatting an entire pandas dataframe as a string, row by row
                            
                                python pandas pivot: How to do a proper tidyr-like spread?
                            
                                How to pipe Picamera video to FFMPEG with subprocess (Python)
                            
                                Intersection of sets as columns in pandas
                            
                                Flask Unit Testing and not understanding my fix for "TypeError: a bytes-like object is required, not 'str'"
                            
                                Merge two lists of dicts of different lengths using a single key in Python
                            
                                Tkinter Scale slider with float values doesn't work with locale of language that uses comma for floats

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What are noisy samples in Scikit's DBSCAN clustering algorithm?

Tags:

python

cluster-analysis

scikit-learn

dbscan

Auxiliary

People also ask

1 Answers

victor

Recent Activity

Donate For Us