Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DBSCAN in scikit-learn of Python: save the cluster points in an array

Tags:

following the example Demo of DBSCAN clustering algorithm of Scikit Learning i am trying to store in an array the x, y of each clustering class

import numpy as np from sklearn.cluster import DBSCAN from sklearn import metrics from sklearn.datasets.samples_generator import make_blobs from sklearn.preprocessing import StandardScaler from pylab import *  # Generate sample data centers = [[1, 1], [-1, -1], [1, -1]] X, labels_true = make_blobs(n_samples=750, centers=centers, cluster_std=0.4, random_state=0) X = StandardScaler().fit_transform(X)   xx, yy = zip(*X) scatter(xx,yy) show() 

enter image description here

db = DBSCAN(eps=0.3, min_samples=10).fit(X) core_samples = db.core_sample_indices_ labels = db.labels_ n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0) print n_clusters_ 3 

enter image description here

I'm trying to understand the DBSCAN implementation by scikit-learn, but from this point I'm having trouble. The number of cluster is 3 (n_clusters_) and I wish to store the x, y of each cluster in an array

like image 236
Gianni Spear Avatar asked Aug 14 '13 16:08

Gianni Spear


People also ask

How do I save a DBSCAN model?

The only way to save a model is to save the assignment of the data points together with the cluster. This is the definition of the clustering. Afterwards new data points could be assigned to the cluster of the closest data point. This is why you cannot save this model.

How does DBSCAN use those points to cluster the dataset?

Let's see how DBSCAN clusters these data points. DBSCAN creates a circle of epsilon radius around every data point and classifies them into Core point, Border point, and Noise. A data point is a Core point if the circle around it contains at least 'minPoints' number of points.


2 Answers

The first cluster is X[labels == 0], etc.:

clusters = [X[labels == i] for i in xrange(n_clusters_)] 

and the outliers are

outliers = X[labels == -1] 
like image 148
Fred Foo Avatar answered Sep 21 '22 15:09

Fred Foo


What do you mean by "of each cluster"?

In DBSCAN, clusters are not represented as centroids as in k-means, so there is no obvious representation of the cluster except its members. You already have the x and y position of the cluster members, as they are the input data.

So I'm not sure what the question is.

like image 23
Andreas Mueller Avatar answered Sep 18 '22 15:09

Andreas Mueller