Determining optimal number of clusters and Davies–Bouldin Index?

Question

I'm trying to evaluate what is the right number of cluster needed for clusterize some data.

I know that this is possible using Davies–Bouldin Index (DBI).

To using DBI you have to compute it for any number of cluster and the one that minimize the DBI corresponds to the right number of cluster needed.

The question is:

how to know if 2 clusters are better than 1 cluster using DBI? So, how can I compute DBI when I have just 1 cluster?

greeness · Accepted Answer

Only considering the average DBI of all clusters apparently is not a good idea.

Certainly, increasing the number of clusters - k, without penalty, will always reduce the amount of DBI in the resulting clustering, to the extreme case of zero DBI if each data point is considered its own cluster (because each data point overlaps with its own centroid).

how to know if 2 clusters are better than 1 cluster using DBI? So, how can I compute DBI when I have just 1 cluster?

So it's hard to say which one is better if you only use the average DBI as the performance metric.

A good practical method is to use the Elbow method.

Another method looks at the percentage of variance explained as a function of the number of clusters: You should choose a number of clusters so that adding another cluster doesn't give much better modeling of the data. More precisely, if you graph the percentage of variance explained by the clusters against the number of clusters, the first clusters will add much information (explain a lot of variance), but at some point the marginal gain will drop, giving an angle in the graph. The number of clusters are chosen at this point, hence the "elbow criterion".

enter image description here

Some other good alternatives with respective to choosing the optimal number of clusters:

Determining the number of clusters in a data set
How to define number of clusters in K-means clustering?

Determining optimal number of clusters and Davies–Bouldin Index?

Tags:

machine-learning

cluster-analysis

Gappa

1 Answers

greeness

Recent Activity

Donate For Us

Determining optimal number of clusters and Davies–Bouldin Index?

Tags:

machine-learning

cluster-analysis

Gappa

1 Answers

greeness

Related questions

Recent Activity

Donate For Us