Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to get all dna encoding for peptide in c#

Hi my head is boiling now for 3 days! I want to get all DNA encodings for a peptide: a peptide is a sequence of amino acids i.e. amino acid M and amino acid Q can form peptide MQ or QM

DNA encoding means there is a DNA code (called codon) for each amino acid (for some there are more than one code i.e. amino acid T has 4 different codes / codons)

The last function in the following code is not working so I want some one to make it work for me and please no query integrated language (I forgot its acronym!)`

private  string[] CODONS ={ 
    "TTT", "TTC", "TTA", "TTG", "TCT",
    "TCC", "TCA", "TCG", "TAT", "TAC", "TGT", "TGC", "TGG", "CTT",
    "CTC", "CTA", "CTG", "CCT", "CCC", "CCA", "CCG", "CAT", "CAC",
    "CAA", "CAG", "CGT", "CGC", "CGA", "CGG", "ATT", "ATC", "ATA",
    "ATG", "ACT", "ACC", "ACA", "ACG", "AAT", "AAC", "AAA", "AAG",
    "AGT", "AGC", "AGA", "AGG", "GTT", "GTC", "GTA", "GTG", "GCT",
    "GCC", "GCA", "GCG", "GAT", "GAC", "GAA", "GAG", "GGT", "GGC",
    "GGA", "GGG", };

private  string[] AMINOS_PER_CODON = { 
    "F", "F", "L", "L", "S", "S",
    "S", "S", "Y", "Y", "C", "C", "W", "L", "L", "L", "L", "P", "P",
    "P", "P", "H", "H", "Q", "Q", "R", "R", "R", "R", "I", "I", "I",
    "M", "T", "T", "T", "T", "N", "N", "K", "K", "S", "S", "R", "R",
    "V", "V", "V", "V", "A", "A", "A", "A", "D", "D", "E", "E", "G",
    "G", "G", "G", };


public  string codonToAminoAcid(String codon)
{
    for (int k = 0; k < CODONS.Length; k++)
    {
        if (CODONS[k].Equals(codon))
        {
            return AMINOS_PER_CODON[k];
        }
    }

    // never reach here with valid codon
    return "X";
}

public  string AminoAcidToCodon(String aminoAcid)
{
    for (int k = 0; k < AMINOS_PER_CODON .Length; k++)
    {
        if (AMINOS_PER_CODON [k].Equals(aminoAcid ))
        {
            return CODONS[k];
        }
    }

    // never reach here with valid codon
    return "X";
}

public string GetCodonsforPeptide(string pep)
{
    string result = ""; 
    for (int i = 0; i <pep.Length ; i++)
    {
        result = AminoAcidToCodon(pep.Substring (i,1) );
        for (int q = 0; q < pep.Length; q++)
        {
            result += AminoAcidToCodon(pep.Substring(q, 1));
        }
    }

    return result;
}
like image 922
kobosh Avatar asked Oct 31 '22 14:10

kobosh


1 Answers

Try using the following two methods:

public IEnumerable<string> AminoAcidToCodon(char aminoAcid)
{
    for (int k = 0; k < AMINOS_PER_CODON.Length; k++)
    {
        if (AMINOS_PER_CODON[k] == aminoAcid)
        {
            yield return CODONS[k];
        }
    }
}

public IEnumerable<string> GetCodonsforPeptide(string pep)
{
    if (string.IsNullOrEmpty(pep))
    {
        yield return string.Empty;
        yield break;
    }

    foreach (var codon in AminoAcidToCodon(pep[0]))
        foreach (var codonOfRest in GetCodonsforPeptide(pep.Substring(1)))
            yield return codon + codonOfRest;
}

Notes:

  • Since each amino acid will have multiple matching codons, your method that returns when it finds the first will only ever match each amino acid once. Instead I created an enumerator method that will yield return each matching codon.
  • The last method finds all matching codons for the first character of the peptide, and combines each such codon with all the codons made up of the rest of the peptide after the first character.
  • I made the AMINOS_PER_CODON array use char as a type instead. You can easily change the code to use your string array if you want.
  • A better approach without two separate arrays would be to create a dictionary mapping each single amino acid character to a list of codon strings.

Example output when passing in "MA":

ATGGCT 
ATGGCC 
ATGGCA 
ATGGCG 

This is because the M maps to these:

ATG

and A maps to these:

GCT 
GCC 
GCA 
GCG

The dictionary I suggest you use would look like this:

var codonsByAminoAcid = new Dictionary<char, string[]>
{
    { 'M', new[] { "ATG" } },
    { 'A', new[] { "GCT", "GCC", "GCA", "GCG" } }
};

This would replace the AminoAcidToCodon method.

You can even build that dictionary from your two arrays:

var lookup = 
    CODONS
    .Zip(AMINOS_PER_CODON, (codon, amino) => new { codon, amino })
    .GroupBy(entry => entry.amino)
    .ToDictionary(
        g => g.Key,
        g => g.Select(ge => ge.codon).ToArray());

The GetCodonsforPeptide method could then look like this:

public IEnumerable<string> GetCodonsforPeptide(string pep)
{
    if (string.IsNullOrEmpty(pep))
    {
        yield return string.Empty;
        yield break;
    }

    foreach (var codon in lookup(pep[0]))
        foreach (var codonOfRest in GetCodonsforPeptide(pep.Substring(1)))
            yield return codon + codonOfRest;
}

ie. replace the call to that other method by the lookup table.

like image 65
Lasse V. Karlsen Avatar answered Nov 15 '22 05:11

Lasse V. Karlsen