Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compute affinity matrix from distance matrix

I used clustal omega to get a distance matrix of 500 protein sequences (they are homologous to each other).

I want to use affinity propagation to cluster these sequences.

Initially, because I observed by hand that the distance matrix only had values between 0 and 1, with 0 distance = 100% identity, I reasoned that I could just take (1 - distance) to get affinity.

I ran my code, and the clusters looked reasonable, and I thought all was well... until I read that typically, affinity matrices are calculated from distance matrices by applying a "heat kernel". That's when all hell broke loose in my mind.

Did I get the concept of affinity matrix incorrect? Is there an easy way of computing the affinity matrix? scikit-learn offers the following formula:

similarity = np.exp(-beta * distance / distance.std())

But what is beta? I know distance.std() is the standard deviation of the distance.

I'm quite confused and lost right now with the concepts involved (as opposed to the actual coding implementation), so any help is greatly appreciated!

P.S. I've tried posting to Biostars.org, but I haven't gotten an answer there...

like image 744
ericmjl Avatar asked Oct 22 '22 11:10

ericmjl


1 Answers

I think both 1-distance and exp(-beta * distance) are valid approaches to convert a distance into a similarity (though they differ in terms of their interpretation in a probabilistic framework). I would simply use what gives the better results.

like image 194
UBod Avatar answered Oct 24 '22 09:10

UBod