I have a DNA sequence and would like to get reverse complement of it using Python. It is in one of the columns of a CSV file and I'd like to write the reverse complement to another column in the same file. The tricky part is, there are a few cells with something other than A, T, G and C. I was able to get reverse complement with this piece of code: <pre class="prettyprint"><code>def complement(seq): complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} bases = list(seq) bases = [complement[base] for base in bases] return ''.join(bases) def reverse_complement(s): return complement(s[::-1]) print "Reverse Complement:" print(reverse_complement("TCGGGCCC")) </code></pre> However, when I try to find the item which is not present in the complement dictionary, using the code below, I just get the complement of the last base. It doesn't iterate. I'd like to know how I can fix it. <pre class="prettyprint"><code>def complement(seq): complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} bases = list(seq) for element in bases: if element not in complement: print element letters = [complement[base] for base in element] return ''.join(letters) def reverse_complement(seq): return complement(seq[::-1]) print "Reverse Complement:" print(reverse_complement("TCGGGCCCCX")) </code></pre>

The other answers are perfectly fine, but if you plan to deal with real DNA sequences I suggest using Biopython. What if you encounter a character like "-", "*" or indefinitions? What if you want to do further manipulations of your sequences? Do you want to create a parser for each file format out there? The code you ask for is as easy as: <pre class="prettyprint"><code>from Bio.Seq import Seq seq = Seq("TCGGGCCC") print seq.reverse_complement() # GGGCCCGA </code></pre> Now if you want to do another transformations: <pre class="prettyprint"><code>print seq.complement() print seq.transcribe() print seq.translate() </code></pre> Outputs <pre class="prettyprint"><code>AGCCCGGG UCGGGCCC SG </code></pre> And if you run into strange chars, no need to keep adding code to your program. Biopython deals with it: <pre class="prettyprint"><code>seq = Seq("TCGGGCCCX") print seq.reverse_complement() # XGGGCCCGA </code></pre>

The fastest one liner for reverse complement is the following: <pre class="prettyprint"><code>def rev_compl(st): nn = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} return "".join(nn[n] for n in reversed(st)) </code></pre>

Reverse complement of DNA strand using Python

Tags:

python

list

bioinformatics

dna-sequence

biopython

I have a DNA sequence and would like to get reverse complement of it using Python. It is in one of the columns of a CSV file and I'd like to write the reverse complement to another column in the same file. The tricky part is, there are a few cells with something other than A, T, G and C. I was able to get reverse complement with this piece of code:

def complement(seq):
    complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} 
    bases = list(seq) 
    bases = [complement[base] for base in bases] 
    return ''.join(bases)
    def reverse_complement(s):
        return complement(s[::-1])

    print "Reverse Complement:"
    print(reverse_complement("TCGGGCCC"))

However, when I try to find the item which is not present in the complement dictionary, using the code below, I just get the complement of the last base. It doesn't iterate. I'd like to know how I can fix it.

def complement(seq):
    complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} 
    bases = list(seq) 
    for element in bases:
        if element not in complement:
            print element  
        letters = [complement[base] for base in element] 
        return ''.join(letters)
def reverse_complement(seq):
    return complement(seq[::-1])

print "Reverse Complement:"
print(reverse_complement("TCGGGCCCCX"))

729

asked Aug 07 '14 17:08

user3783999

2 Answers

The other answers are perfectly fine, but if you plan to deal with real DNA sequences I suggest using Biopython. What if you encounter a character like "-", "*" or indefinitions? What if you want to do further manipulations of your sequences? Do you want to create a parser for each file format out there?

The code you ask for is as easy as:

from Bio.Seq import Seq

seq = Seq("TCGGGCCC")

print seq.reverse_complement()
# GGGCCCGA

Now if you want to do another transformations:

print seq.complement()
print seq.transcribe()
print seq.translate()

Outputs

AGCCCGGG
UCGGGCCC
SG

And if you run into strange chars, no need to keep adding code to your program. Biopython deals with it:

seq = Seq("TCGGGCCCX")
print seq.reverse_complement()
# XGGGCCCGA

answered Sep 18 '22 14:09

xbello

The fastest one liner for reverse complement is the following:

def rev_compl(st):
    nn = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
    return "".join(nn[n] for n in reversed(st))

answered Sep 18 '22 14:09

alphahmed

Related questions
                            
                                SQLAlchemy returns tuple not dictionary
                            
                                python key in dict.keys() performance for large dictionaries
                            
                                "Initializing" variables in python?
                            
                                Choose m evenly spaced elements from a sequence of length n
                            
                                how to create a list of lists
                            
                                Python split url to find image name and extension
                            
                                How to print a string multiple times? [closed]
                            
                                how to upload and read csv file in django using csv.DictReader?
                            
                                Django or Ruby on Rails [closed]
                            
                                Django Help: AttributeError: 'module' object has no attribute 'Charfield'
                            
                                How to map numbers in range <0;99> to range <-1.0;1.0>?
                            
                                Numpy mean with condition
                            
                                How can I convert this string to list of lists? [duplicate]
                            
                                List running processes on 64-bit Windows
                            
                                Installing gmpy on OSX - mpc.h not found
                            
                                How can I make python 2.6 my default in Mac OS X Lion?
                            
                                Satchmo clonesatchmo.py ImportError: cannot import name execute_manager
                            
                                How can I find all placeholders for str.format in a python string using a regex? [duplicate]
                            
                                Why does django not see my tests?
                            
                                "else" considered harmful in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With