Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reverse complement of DNA strand using Python

I have a DNA sequence and would like to get reverse complement of it using Python. It is in one of the columns of a CSV file and I'd like to write the reverse complement to another column in the same file. The tricky part is, there are a few cells with something other than A, T, G and C. I was able to get reverse complement with this piece of code:

def complement(seq):
    complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} 
    bases = list(seq) 
    bases = [complement[base] for base in bases] 
    return ''.join(bases)
    def reverse_complement(s):
        return complement(s[::-1])

    print "Reverse Complement:"
    print(reverse_complement("TCGGGCCC"))

However, when I try to find the item which is not present in the complement dictionary, using the code below, I just get the complement of the last base. It doesn't iterate. I'd like to know how I can fix it.

def complement(seq):
    complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} 
    bases = list(seq) 
    for element in bases:
        if element not in complement:
            print element  
        letters = [complement[base] for base in element] 
        return ''.join(letters)
def reverse_complement(seq):
    return complement(seq[::-1])

print "Reverse Complement:"
print(reverse_complement("TCGGGCCCCX"))
like image 729
user3783999 Avatar asked Aug 07 '14 17:08

user3783999


People also ask

How do you find the reverse complement of a DNA sequence?

Normally, DNA occurs as a double strand where each A is paired with a T and vice versa, and each C is paired with a G and vice versa. The reverse complement of a DNA sequence is formed by reversing the letters, interchanging A and T and interchanging C and G. Thus the reverse complement of ACCTGAG is CTCAGGT.

Which function is used for calculating reverse complement?

reverse_complement() computes the reverse complement (duh).

What is reverse complement DNA?

Reverse Complement. Reverse Complement converts a DNA sequence into its reverse, complement, or reverse-complement counterpart. The entire IUPAC DNA alphabet is supported, and the case of each input sequence character is maintained.


2 Answers

The other answers are perfectly fine, but if you plan to deal with real DNA sequences I suggest using Biopython. What if you encounter a character like "-", "*" or indefinitions? What if you want to do further manipulations of your sequences? Do you want to create a parser for each file format out there?

The code you ask for is as easy as:

from Bio.Seq import Seq

seq = Seq("TCGGGCCC")

print seq.reverse_complement()
# GGGCCCGA

Now if you want to do another transformations:

print seq.complement()
print seq.transcribe()
print seq.translate()

Outputs

AGCCCGGG
UCGGGCCC
SG

And if you run into strange chars, no need to keep adding code to your program. Biopython deals with it:

seq = Seq("TCGGGCCCX")
print seq.reverse_complement()
# XGGGCCCGA
like image 57
xbello Avatar answered Sep 18 '22 14:09

xbello


The fastest one liner for reverse complement is the following:

def rev_compl(st):
    nn = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
    return "".join(nn[n] for n in reversed(st))
like image 21
alphahmed Avatar answered Sep 18 '22 14:09

alphahmed