How do I cluster with KL-divergence?

Question

I want to cluster my data with KL-divergence as my metric.

In K-means:

Choose the number of clusters.
Initialize each cluster's mean at random.
Assign each data point to a cluster c with minimal distance value.
Update each cluster's mean to that of the data points assigned to it.

In the Euclidean case it's easy to update the mean, just by averaging each vector.

However, if I'd like to use KL-divergence as my metric, how do I update my mean?

mitchus · Accepted Answer

Clustering with KL-divergence may not be the best idea, because KLD is missing an important property of metrics: symmetry. Obtained clusters could then be quite hard to interpret. If you want to go ahead with KLD, you could use as distance the average of KLD's i.e.

d(x,y) = KLD(x,y)/2 + KLD(y,x)/2

Bashar Haddad · Answer

It is not a good idea to use KLD for two reasons:-

It is not symmetry KLD(x,y) ~= KLD(y,x)
You need to be careful when using KLD in programming: the division may lead to Inf values and NAN as a result.

Adding a small number may affect the accuracy.

How do I cluster with KL-divergence?

Tags:

machine-learning

k-means

Jing

2 Answers

mitchus

Bashar Haddad

Recent Activity

Donate For Us

How do I cluster with KL-divergence?

Tags:

machine-learning

k-means

Jing

2 Answers

mitchus

Bashar Haddad

Related questions

Recent Activity

Donate For Us