Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a function that can calculate a score for aligned sequences given the alignment parameters?

I try to score the already-aligned sequences. Let say

seq1 = 'PAVKDLGAEG-ASDKGT--SHVVY----------TI-QLASTFE'
seq2 = 'PAVEDLGATG-ANDKGT--LYNIYARNTEGHPRSTV-QLGSTFE'

with given parameters

substitution matrix : blosum62
gap open penalty : -5
gap extension penalty : -1

I did look through the biopython cookbook but all i can get is substitution matrix blogsum62 but I feel that it must have someone already implemented this kind of library.

So can anyone suggest any libraries or shortest code that can solve my problem?

Thx in advance

like image 216
Jessada Thutkawkorapin Avatar asked Apr 16 '11 11:04

Jessada Thutkawkorapin


People also ask

How are alignment scores calculated?

The score of an alignment, S, calculated as the sum of substitution and gap scores. Substitution scores are given by a look-up table (see PAM, BLOSUM). Gap scores are typically calculated as the sum of G, the gap opening penalty and L, the gap extension penalty. For a gap of length n, the gap cost would be G+Ln.

How are scoring metrics used to identify alignments?

Scoring matrices are used to determine the relative score made by matching two characters in a sequence alignment. These are usually log-odds of the likelihood of two characters being derived from a common ancestral character.

How is Blosum alignment score calculated?

Each value in the matrix is calculated by dividing the frequency of occurrence of the amino acid pair in the BLOCKS database, clustered at the 62% level, divided by the probability that the same two amino acids might align by chance.

How do you score a multiple sequence alignment?

The scoring process of MSA is based on the sum of the scores of all possible pairs of sequences in the multiple alignment according to some scoring matrix. You can refer my previous article to learn about the different scoring matrices and how to match them. where score(A, B) = pair-wise alignment score of A, B.


1 Answers

Jessada,

The Blosum62 matrix (note the spelling ;) is in Bio.SubsMat.MatrixInfo and is a dictionary with tuples resolving to scores (so ('A', 'A') is worth 4 pts). It doesn't have the gaps, and it's only one triangle of the matrix (so it might ahve ('T', 'A') but not ('A', 'T'). There are some helper functions in Biopython, including some in Bio.Pairwise, but this is what I came up with as an answer:

from Bio.SubsMat import MatrixInfo

def score_match(pair, matrix):
    if pair not in matrix:
        return matrix[(tuple(reversed(pair)))]
    else:
        return matrix[pair]

def score_pairwise(seq1, seq2, matrix, gap_s, gap_e):
    score = 0
    gap = False
    for i in range(len(seq1)):
        pair = (seq1[i], seq2[i])
        if not gap:
            if '-' in pair:
                gap = True
                score += gap_s
            else:
                score += score_match(pair, matrix)
        else:
            if '-' not in pair:
                gap = False
                score += score_match(pair, matrix)
            else:
                score += gap_e
    return score

seq1 = 'PAVKDLGAEG-ASDKGT--SHVVY----------TI-QLASTFE'
seq2 = 'PAVEDLGATG-ANDKGT--LYNIYARNTEGHPRSTV-QLGSTFE'

blosum = MatrixInfo.blosum62

score_pairwise(seq1, seq2, blosum, -5, -1)

Which returns 82 for your alignment. There's almost certianly prettier ways to do all of this, but that should be a good start.

like image 198
david w Avatar answered Sep 24 '22 00:09

david w