Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Affinity Propagation (sklearn) - strange behavior

Trying to use affinity propagation for a simple clustering task:

from sklearn.cluster import AffinityPropagation
c = [[0], [0], [0], [0], [0], [0], [0], [0]]
af = AffinityPropagation (affinity = 'euclidean').fit (c)
print (af.labels_)

I get this strange result: [0 1 0 1 2 1 1 0]

I would expect to have all samples in the same cluster, like in this case:

c = [[0], [0], [0]]
af = AffinityPropagation (affinity = 'euclidean').fit (c)
print (af.labels_)

which indeed puts all samples in the same cluster: [0 0 0]

What am I missing?

Thanks

like image 555
Baba Avatar asked Jun 14 '15 13:06

Baba


People also ask

What is damping in Affinity Propagation?

Damping factor in the range [0.5, 1.0) is the extent to which the current value is maintained relative to incoming values (weighted 1 - damping). This in order to avoid numerical oscillations when updating these values (messages).

How does Affinity Propagation clustering work?

In contrast to other traditional clustering methods, Affinity Propagation does not require you to specify the number of clusters. In layman's terms, in Affinity Propagation, each data point sends messages to all other points informing its targets of each target's relative attractiveness to the sender.

Which of the following parameters are used to control Affinity Propagation clustering?

Unlike clustering algorithms such as k-means or k-medoids, affinity propagation does not require the number of clusters to be determined or estimated before running the algorithm, for this purpose the two important parameters are the preference, which controls how many exemplars (or prototypes) are used, and the ...


1 Answers

I believe this is because your problem is essentially ill-posed (you pass lots of the same point to an algorithm which is trying to find similarity between different points). AffinityPropagation is doing matrix math under the hood, and your similarity matrix (which is all zeros) is nastily degenerate. In order to not error out, the implementation adds a small random matrix to the similarity matrix, preventing the algorithm from quitting when it encounters two of the same point.

like image 120
Andreus Avatar answered Sep 23 '22 07:09

Andreus