HDBSCAN Python choose number of clusters

Question

Is is possible to select the number of clusters in the HDBSCAN algorithm in python? Or the only way is to play around with the input parameters such as alpha, min_cluster_size?

Thanks

UPDATE: here is the code to use fcluster and hdbscan

import hdbscan
from scipy.cluster.hierarchy import fcluster

clusterer = hdbscan.HDBSCAN()
clusterer.fit(X)
Z = clusterer.single_linkage_tree_.to_numpy()
labels = fcluster(Z, 2, criterion='maxclust')

Lib101 · Accepted Answer

Thankfully, on June 2020 a contributor on GitHub (Module for flat clustering) provided a commit that adds code to hdbscan that allows us to choose the number of resulting clusters.

To do so:

from hdbscan import flat

clusterer = flat.HDBSCAN_flat(train_df, n_clusters, prediction_data=True)
flat.approximate_predict_flat(clusterer, points_to_predict, n_clusters)

You can find the code here flat.py You should be able to choose the number of clusters for test points using approximate_predict_flat.

In addition, a jupyter notebook has also been written explaining how to use it, Here.

Leland McInnes · Answer

If you explicitly need to get a fixed number of clusters then the closest thing to managing that would be to use the cluster hierarchy and perform a flat cut through the hierarchy at the level that gives you the desired number of clusters. That does involve working with one of the tree objects that HDBSCAN exposes and getting your hands a little dirty, but it can be done.

HDBSCAN Python choose number of clusters

Tags:

python

hierarchical-clustering

user1571823

2 Answers

Lib101

Leland McInnes

Recent Activity

Donate For Us

HDBSCAN Python choose number of clusters

Tags:

python

hierarchical-clustering

user1571823

2 Answers

Lib101

Leland McInnes

Related questions

Recent Activity

Donate For Us