Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I convert the three letter amino acid codes to one letter code with python or R?

I have a fasta file as shown below. I would like to convert the three letter codes to one letter code. How can I do this with python or R?

>2ppo
ARGHISLEULEULYS
>3oot
METHISARGARGMET

desired output

>2ppo
RHLLK
>3oot
MHRRM

your suggestions would be appreciated!!

like image 438
user1725152 Avatar asked Oct 06 '12 13:10

user1725152


2 Answers

BioPython already has built-in dictionaries to help with such translations. Following commands will show you a whole list of available dictionaries:

import Bio
help(Bio.SeqUtils.IUPACData)

The predefined dictionary you are looking for:

Bio.SeqUtils.IUPACData.protein_letters_3to1['Ala']
like image 123
Henk Neefs Avatar answered Oct 12 '22 09:10

Henk Neefs


Use a dictionary to look up the one letter codes:

d = {'CYS': 'C', 'ASP': 'D', 'SER': 'S', 'GLN': 'Q', 'LYS': 'K',
     'ILE': 'I', 'PRO': 'P', 'THR': 'T', 'PHE': 'F', 'ASN': 'N', 
     'GLY': 'G', 'HIS': 'H', 'LEU': 'L', 'ARG': 'R', 'TRP': 'W', 
     'ALA': 'A', 'VAL':'V', 'GLU': 'E', 'TYR': 'Y', 'MET': 'M'}

And a simple function to match the three letter codes with one letter codes for the entire string:

def shorten(x):
    if len(x) % 3 != 0: 
        raise ValueError('Input length should be a multiple of three')

    y = ''
    for i in range(len(x) // 3):
            y += d[x[3 * i : 3 * i + 3]]
    return y

Testing your example:

>>> shorten('ARGHISLEULEULYS')
'RHLLK'
like image 45
Junuxx Avatar answered Oct 12 '22 10:10

Junuxx