I try to score the already-aligned sequences. Let say
seq1 = 'PAVKDLGAEG-ASDKGT--SHVVY----------TI-QLASTFE'
seq2 = 'PAVEDLGATG-ANDKGT--LYNIYARNTEGHPRSTV-QLGSTFE'
with given parameters
substitution matrix : blosum62
gap open penalty : -5
gap extension penalty : -1
I did look through the biopython cookbook but all i can get is substitution matrix blogsum62 but I feel that it must have someone already implemented this kind of library.
So can anyone suggest any libraries or shortest code that can solve my problem?
Thx in advance
The score of an alignment, S, calculated as the sum of substitution and gap scores. Substitution scores are given by a look-up table (see PAM, BLOSUM). Gap scores are typically calculated as the sum of G, the gap opening penalty and L, the gap extension penalty. For a gap of length n, the gap cost would be G+Ln.
Scoring matrices are used to determine the relative score made by matching two characters in a sequence alignment. These are usually log-odds of the likelihood of two characters being derived from a common ancestral character.
Each value in the matrix is calculated by dividing the frequency of occurrence of the amino acid pair in the BLOCKS database, clustered at the 62% level, divided by the probability that the same two amino acids might align by chance.
The scoring process of MSA is based on the sum of the scores of all possible pairs of sequences in the multiple alignment according to some scoring matrix. You can refer my previous article to learn about the different scoring matrices and how to match them. where score(A, B) = pair-wise alignment score of A, B.
Jessada,
The Blosum62 matrix (note the spelling ;) is in Bio.SubsMat.MatrixInfo and is a dictionary with tuples resolving to scores (so ('A', 'A')
is worth 4 pts). It doesn't have the gaps, and it's only one triangle of the matrix (so it might ahve ('T', 'A') but not ('A', 'T'). There are some helper functions in Biopython, including some in Bio.Pairwise, but this is what I came up with as an answer:
from Bio.SubsMat import MatrixInfo
def score_match(pair, matrix):
if pair not in matrix:
return matrix[(tuple(reversed(pair)))]
else:
return matrix[pair]
def score_pairwise(seq1, seq2, matrix, gap_s, gap_e):
score = 0
gap = False
for i in range(len(seq1)):
pair = (seq1[i], seq2[i])
if not gap:
if '-' in pair:
gap = True
score += gap_s
else:
score += score_match(pair, matrix)
else:
if '-' not in pair:
gap = False
score += score_match(pair, matrix)
else:
score += gap_e
return score
seq1 = 'PAVKDLGAEG-ASDKGT--SHVVY----------TI-QLASTFE'
seq2 = 'PAVEDLGATG-ANDKGT--LYNIYARNTEGHPRSTV-QLGSTFE'
blosum = MatrixInfo.blosum62
score_pairwise(seq1, seq2, blosum, -5, -1)
Which returns 82 for your alignment. There's almost certianly prettier ways to do all of this, but that should be a good start.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With