![enter image description here][1]
from sklearn.cluster import DBSCAN
dbscan = DBSCAN(eps=0.001, min_samples=10)
clustering = dbscan.fit(X)
Example vectors:
array([[ 0.05811029, -1.089355 , -1.9143777 , ..., 1.235167 ,
-0.6473859 , 1.5684978 ],
[-0.7117326 , -0.31876346, -0.45949244, ..., 0.17786546,
1.9377285 , 2.190525 ],
[ 1.1685177 , -0.18201494, 0.19475089, ..., 0.7026453 ,
0.3937522 , -0.78675956],
...,
[ 1.4172379 , 0.01070347, -1.3984257 , ..., -0.70529956,
0.19471683, -0.6201791 ],
[ 0.6171041 , -0.8058429 , 0.44837445, ..., 1.216958 ,
-0.10003573, -0.19012968],
[ 0.6433722 , 1.1571665 , -1.2123466 , ..., 0.592805 ,
0.23889546, 1.6207514 ]], dtype=float32)
X is model.wv.vectors, generated from model = word2vec.Word2Vec(sent, min_count=1,size= 50,workers=3, window =3, sg = 1)
Results are as follows:
array([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1])
Based on the docs:
labels_array, shape = [n_samples]
Cluster labels for each point in the dataset given to fit(). Noisy samples are given the label -1.
The answer to this you can find here: What are noisy samples in Scikit's DBSCAN clustering algorithm?
Shortword: These are not exactly part of a cluster. They are simply points that do not belong to any clusters and can be "ignored" to some extent. It seems that you have really different data, which does not have central clustering classes.
What you can try?
DBSCAN(eps=0.5, min_samples=5, metric='euclidean', metric_params=None, algorithm='auto', leaf_size=30, p=None, n_jobs=None)
You can play with the parameters or change the clustering algorithm? Did you try kmeans?
Your eps value is 0.001; try increasing that so that you get clusters forming (or else every point will be considered an outlier / labelled -1 because it's not in a cluster)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With