Compute affinity matrix from distance matrix

Question

I used clustal omega to get a distance matrix of 500 protein sequences (they are homologous to each other).

I want to use affinity propagation to cluster these sequences.

Initially, because I observed by hand that the distance matrix only had values between 0 and 1, with 0 distance = 100% identity, I reasoned that I could just take (1 - distance) to get affinity.

I ran my code, and the clusters looked reasonable, and I thought all was well... until I read that typically, affinity matrices are calculated from distance matrices by applying a "heat kernel". That's when all hell broke loose in my mind.

Did I get the concept of affinity matrix incorrect? Is there an easy way of computing the affinity matrix? scikit-learn offers the following formula:

similarity = np.exp(-beta * distance / distance.std())

But what is beta? I know distance.std() is the standard deviation of the distance.

I'm quite confused and lost right now with the concepts involved (as opposed to the actual coding implementation), so any help is greatly appreciated!

P.S. I've tried posting to Biostars.org, but I haven't gotten an answer there...

UBod · Accepted Answer

I think both 1-distance and exp(-beta * distance) are valid approaches to convert a distance into a similarity (though they differ in terms of their interpretation in a probabilistic framework). I would simply use what gives the better results.

Compute affinity matrix from distance matrix

Tags:

python

bioinformatics

affinity

ericmjl

1 Answers

UBod

Recent Activity

Donate For Us

Compute affinity matrix from distance matrix

Tags:

python

bioinformatics

affinity

ericmjl

1 Answers

UBod

Related questions

Recent Activity

Donate For Us