Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize a python script which runs for 4**k times?

Programming language: Python 3.4

I have written a program for the Bioinformatics 1 course from Coursera. The program is working all right, but is very slow for large datasets. I guess, it is because the loop is running for 4**k times, where k is the length of the sub-string that is passed into the function. Input: Strings Text and Pattern along with an integer d. Output: All starting positions where Pattern appears as a substring of Text with at most d mismatches.

This is my code:

def MotifCount(string1, substring, d):
    k = 4 ** (len(substring))
    codeArray = list(itertools.product(['A', 'C', 'G', 'T'], repeat=len(substring)))
    for i in range(k):
        codeArray2 = ''.join(list(codeArray[i]))
        HammingValue = HammingDistance(codeArray2, substring)
        if HammingValue <= d:
            for j in range(len(string1)):
                if(string1.find(codeArray2, j) == j):
                    print(j)



def HammingDistance(string_1, string_2):
    length_1 = len(string_1)
    length_2 = len(string_2)
    count = 0
    for i in range(length_1):
        if string_1[i] != string_2[i]:
            count += 1
    return count

Sample Input:

CGCCCGAATCCAGAACGCATTCCCATATTTCGGGACCACTGGCCTCCACGGTACGGACGTCAATCAAAT
ATTCTGGA
3

Output:

6 7 26 27

I want to optimize this code for bigger data sets. Is there any way to reduce the run time of the code?

like image 988
DarkRose Avatar asked Jun 27 '15 07:06

DarkRose


1 Answers

import itertools

def HammingDistance(string_1, string_2):
    assert len(string_1) == len(string_2)
    return sum(c1 != c2 for c1, c2 in zip(string_1, string_2))

def MotifCount(string1, substring, d):
    for i in range(len(string1) - len(substring) + 1):
        if HammingDistance(string1[i:i+len(substring)], substring) <= d:
            print(i)

MotifCount("CGCCCGAATCCAGAACGCATTCCCATATTTCGGGACCACTGGCCTCCACGGTACGGACGTCAATCAAAT", "ATTCTGGA", 3)

It gives:

6
7
26
27

Quickly.

like image 164
dlask Avatar answered Oct 25 '22 16:10

dlask