Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python bitarray reverse complement

I am using Python's bitarray module to convert a DNA sequence, that is written in a binary file, to its reverse complement. Each nucleotide is represented by two bits in the following format:

A - 00, C - 01, G - 10, T - 11.

For example, the reverse complement of
AGCTACGG (00 10 01 11 00 01 10 10) would be CCGTAGCT (01 01 10 11 00 10 01 11).

This sequence takes up exactly 16 bits (2 bytes), but a sequence of length 9 would take 18 bits and it is padded to take up 24 bits (3 bytes).

At the moment I use a for cycle for the conversion, but this solution is dreadfully slow.

def reverse_complement( my_bitarray, seq_length ):

    for i in range(0, 2 * seq_length - 1, 2):

        if my_bitarray[i] == my_bitarray[i + 1]:

            if my_bitarray[i] == 0:
                my_bitarray[i], my_bitarray[i + 1] = 1, 1

            else:
                my_bitarray[i], my_bitarray[i + 1] = 0, 0

    #padding if the bitarray is not a multiple of 8 bits in length
    if seq_length / 4 != int():
        my_bitarray.reverse()
        my_bitarray.fill()
        my_bitarray.reverse()

    return my_bitarray

a = bitarray()
a.frombytes(seq[::-1])
b = a[int(seq_start)::] # seq without padding
b.reverse()

reverse_complement(b, seq_length)

Any tips on how to make this process faster?

like image 960
krlk89 Avatar asked May 16 '26 15:05

krlk89


1 Answers

The code you provided doesn't give the answer you indicated.

Here is code that gives the correct answer. Perhaps it will also be fast enough:

def reverse_complement(my_bitarray):
    # First reverse by twos
    my_bitarray = zip(my_bitarray[0::2], my_bitarray[1::2])
    my_bitarray = reversed(list(my_bitarray))
    my_bitarray = (i for t in my_bitarray for i in t)
    my_bitarray = bitarray(my_bitarray)

    # Then complement
    my_bitarray.invert()
    return my_bitarray

Note that you don't have to worry about the padding. bitarray.bitarray() manages all of that for you.

like image 71
Robᵩ Avatar answered May 18 '26 03:05

Robᵩ