I am using python to create a program that converts a set of DNA sequences into amino acid (protein) sequences. I then need to find a specific subsequence, and count the number of sequences in which this specific subsequence is present. This is the code I have so far:
#Open cDNA_sequences file and read in line by line
with open('cDNA_sequences.csv', 'r') as results:
for line in results:
columns = line.rstrip("\n").split(",") #remove end of line characters and split commas to produce a list
ensemblID = columns[0] #ensemblID is first element in our list
dna_seq = columns[1] #dna_seq is second element in our list
genetic code = {
"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
"UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
"UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
"UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
"CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
"CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
"CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
"CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
"AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
"ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
"AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
"AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
"GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
"GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
"GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
"GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",} #genetic code, telling into which amino acids the DNA triplets translate
for i in range (0, len(dna_seq), 3):
codon = dna_seq[i:i+3]
protein += genetic_code [codon]
print (protein)
enterokinase_motif = "DDDDK"
proline_motif = "DDDDKP"
motif_number = 0
if enterokinase_motif in line:
motif_number = motif_number + 1;
elif proline_number in line:
motif_number = motif_number;
else:
motif_number = motif_number
print ("The number of sequences containing one or more enterokinase motifs is []".format(motif_number))
I am having trouble writing the code for the conversion of the DNA sequences to Protein Sequences.
You should read about Biopython. It comes with handy functions and classes related to Biology and Bioinformatics.
It has a function that does what you are looking for: Bio.Seq.translate
Here you have code example:
>>> coding_dna = "GTGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG"
>>> translate(coding_dna)
'VAIVMGR*KGAR*'
>>> translate(coding_dna, stop_symbol="@")
'VAIVMGR@KGAR@'
>>> translate(coding_dna, to_stop=True)
'VAIVMGR'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With