Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HDBSCAN difference between parameters

I'm confused about the difference between the following parameters in HDBSCAN

  1. min_cluster_size
  2. min_samples
  3. cluster_selection_epsilon

Correct me if I'm wrong.

For min_samples, if it is set to 7, then clusters formed need to have 7 or more points. For cluster_selection_epsilon if it is set to 0.5 meters, than any clusters that are more than 0.5 meters apart will not be merged into one. Meaning that each cluster will only include points that are 0.5 meters apart or less.

How is that different from min_cluster_size?

like image 448
HR1 Avatar asked Apr 17 '26 05:04

HR1


1 Answers

They technically do two different things.

min_samples = the minimum number of neighbours to a core point. The higher this is, the more points are going to be discarded as noise/outliers. This is from DBScan part of HDBScan.

min_cluster_size = the minimum size a final cluster can be. The higher this is, the bigger your clusters will be. This is from the H part of HDBScan.

Increasing min_samples will increase the size of the clusters, but it does so by discarding data as outliers using DBSCAN.

Increasing min_cluster_size while keeping min_samples small, by comparison, keeps those outliers but instead merges any smaller clusters with their most similar neighbour until all clusters are above min_cluster_size.

So:

  1. If you want many highly specific clusters, use a small min_samples and a small min_cluster_size.
  2. If you want more generalized clusters but still want to keep most detail, use a small min_samples and a large min_cluster_size
  3. If you want very very general clusters and to discard a lot of noise in the clusters, use a large min_samples and a large min_cluster_size.

(It's not possible to use min_samples larger than min_cluster_size, afaik)

like image 161
user3252344 Avatar answered Apr 19 '26 15:04

user3252344



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!