Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are all labels_ are -1? Generated by DBSCAN in Python

![enter image description here][1]

from sklearn.cluster import DBSCAN
dbscan = DBSCAN(eps=0.001, min_samples=10) 
clustering = dbscan.fit(X)

Example vectors:

array([[ 0.05811029, -1.089355  , -1.9143777 , ...,  1.235167  ,
    -0.6473859 ,  1.5684978 ],
   [-0.7117326 , -0.31876346, -0.45949244, ...,  0.17786546,
     1.9377285 ,  2.190525  ],
   [ 1.1685177 , -0.18201494,  0.19475089, ...,  0.7026453 ,
     0.3937522 , -0.78675956],
   ...,
   [ 1.4172379 ,  0.01070347, -1.3984257 , ..., -0.70529956,
     0.19471683, -0.6201791 ],
   [ 0.6171041 , -0.8058429 ,  0.44837445, ...,  1.216958  ,
    -0.10003573, -0.19012968],
   [ 0.6433722 ,  1.1571665 , -1.2123466 , ...,  0.592805  ,
     0.23889546,  1.6207514 ]], dtype=float32)

X is model.wv.vectors, generated from model = word2vec.Word2Vec(sent, min_count=1,size= 50,workers=3, window =3, sg = 1)

Results are as follows:

array([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1])

like image 590
Jing Avatar asked Jan 16 '20 18:01

Jing


2 Answers

Based on the docs:

labels_array, shape = [n_samples]

Cluster labels for each point in the dataset given to fit(). Noisy samples are given the label -1.

The answer to this you can find here: What are noisy samples in Scikit's DBSCAN clustering algorithm?

Shortword: These are not exactly part of a cluster. They are simply points that do not belong to any clusters and can be "ignored" to some extent. It seems that you have really different data, which does not have central clustering classes.

What you can try?

DBSCAN(eps=0.5, min_samples=5, metric='euclidean', metric_params=None, algorithm='auto', leaf_size=30, p=None, n_jobs=None)

You can play with the parameters or change the clustering algorithm? Did you try kmeans?

like image 122
PV8 Avatar answered Sep 27 '22 23:09

PV8


Your eps value is 0.001; try increasing that so that you get clusters forming (or else every point will be considered an outlier / labelled -1 because it's not in a cluster)

like image 38
ron_g Avatar answered Sep 27 '22 23:09

ron_g