I'm trying to understand the basic steps of FASTA algorithm in searching similar sequences of a query sequence in a database. These are the steps of the algorithm:
I'm confused with the 3rd and 4th steps in using PAM250 score matrix, and how to "join using gaps".
Can somebody explain these two steps for me "as specifically as possible". Thanks
FASTA takes a given nucleotide or amino acid sequence and searches a corresponding sequence database by using local sequence alignment to find matches of similar database sequences. The FASTA program follows a largely heuristic method which contributes to the high speed of its execution.
In bioinformatics and biochemistry, the FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences.
What is FASTA format? FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which base pairs or amino acids are represented using single-letter codes. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data.
BLAST and FASTA are two similarity searching programs that identify homologous DNA sequences and proteins based on the excess sequence similarity. The excess similarity between two DNA or amino acid sequences arises due to the common ancestry-homology.
This is how FASTA works:
If there are insufficient initial regions to form an alignment in 3), the best score from 2) can be used to rank sequences by similarity. Scores from 3) and 4) can also be used for that purpose.
Unfortunately my institution doesn't have access to the original FASTA paper so I can't supply the original values of the various parameters mentioned above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With