I have been trying to implement DBSCAN using scikit and am so far failing to determine the values of epsilon and min_sample which will give me a sizeable number of clusters. I tried finding the average value in the distance matrix and used values on either side of the mean but haven't got a satisfactory number of clusters:
Input:
db=DBSCAN(eps=13.0,min_samples=100).fit(X)
labels = db.labels_
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
print('Estimated number of clusters: %d' % n_clusters_)
output:
Estimated number of clusters: 1
Input:
db=DBSCAN(eps=27.0,min_samples=100).fit(X)
Output:
Estimated number of clusters: 1
Also so other information:
The average distance between any 2 points in the distance matrix is 16.8354
the min distance is 1.0
the max distance is 258.653
Also the X passed in the code is not the distance matrix but the matrix of feature vectors. So please tell me how do i determine these parameters
plot a k-distance graph, and look for a knee there. As suggested in the DBSCAN article. (Your min_samples might be too high - you probably won't have a knee in the 100-distance graph then.)
Visualize your data. If you can't visually see clusters, there might be no clusters. DBSCAN cannot be forced to produce an arbitrary number of clusters. If your data set is a Gaussian distribution, it is supposed to be a single cluster only.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With