Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I cluster with KL-divergence?

I want to cluster my data with KL-divergence as my metric.

In K-means:

  1. Choose the number of clusters.

  2. Initialize each cluster's mean at random.

  3. Assign each data point to a cluster c with minimal distance value.

  4. Update each cluster's mean to that of the data points assigned to it.

In the Euclidean case it's easy to update the mean, just by averaging each vector.

However, if I'd like to use KL-divergence as my metric, how do I update my mean?

like image 218
Jing Avatar asked Feb 02 '13 10:02

Jing


2 Answers

Clustering with KL-divergence may not be the best idea, because KLD is missing an important property of metrics: symmetry. Obtained clusters could then be quite hard to interpret. If you want to go ahead with KLD, you could use as distance the average of KLD's i.e.

d(x,y) = KLD(x,y)/2 + KLD(y,x)/2

like image 144
mitchus Avatar answered Sep 18 '22 20:09

mitchus


It is not a good idea to use KLD for two reasons:-

  1. It is not symmetry KLD(x,y) ~= KLD(y,x)
  2. You need to be careful when using KLD in programming: the division may lead to Inf values and NAN as a result.

Adding a small number may affect the accuracy.

like image 36
Bashar Haddad Avatar answered Sep 18 '22 20:09

Bashar Haddad