Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing nested for loops and value assignment for list comprehension

I've written a function to count the occurences of certain characters (A, C, G and T) within multiple strings at the same position and save the number of occurrences in a dictionary.

For example with these two strings 'ACGG' and 'CAGT', it should return:

{'A': [1, 1, 0, 0], 'C': [1, 1, 0, 0], 'G': [0, 0, 2, 1], 'T': [0, 0, 0, 1]}

I want to convert the code below to list comprehension to optimize it for speed. It uses two nested for loops, and the input Motifs is a list of strings containing A's C's G's and T's.

def CountWithPseudocounts(Motifs):
    count = {}
    k = len(Motifs[0])
    t = len(Motifs)
    for s in 'ACGT':
        count[s] = [0] * k
    for i in range(t):
        for j in range(k):
            symbol = Motifs[i][j]
            count[symbol][j] += 1
return count

I've tried replacing the nested for loops at the bottom of the function for this list comprehension:

count = [ [ count[Motifs[i][j]][j] += 1 ] for i in range(0, t) ] for j in range(0, k)]

It doesn't work, probably because I'm not allowed to do the value assignment of += 1 within the list comprehension. How can I work around this?

like image 301
DavidK11 Avatar asked Mar 08 '17 13:03

DavidK11


People also ask

Can we use nested for loop in list comprehension?

A nested list comprehension doubles down on the concept of list comprehensions. It's a way to combine not only one, but multiple for loops, if statements and functions into a single line of code. This becomes useful when you have a list of lists (instead of merely a single list).

Is list comprehension more efficient than for loop?

Because of differences in how Python implements for loops and list comprehension, list comprehensions are almost always faster than for loops when performing operations. Below, the same operation is performed by list comprehension and by for loop.

Why might you use a list comprehension instead of a loop?

List comprehensions are often not only more readable but also faster than using “for loops.” They can simplify your code, but if you put too much logic inside, they will instead become harder to read and understand.

Is list comprehension same as for loop?

List comprehensions are also more declarative than loops, which means they're easier to read and understand. Loops require you to focus on how the list is created. You have to manually create an empty list, loop over the elements, and add each of them to the end of the list.


2 Answers

You can use zip():

In [10]: a = 'ACGG'           

In [11]: b = 'CAGT'

In [12]: chars = ['A', 'C', 'G', 'T'] 

In [13]: [[(ch==i) + (ch==j) for i, j in zip(a, b)] for ch in chars]
Out[13]: [[1, 1, 0, 0], [1, 1, 0, 0], [0, 0, 2, 1], [0, 0, 0, 1]]

If you want a dictionary you can use a dict comprehension:

In [25]: {ch:[(ch==i) + (ch==j) for i, j in zip(a, b)] for ch in chars}
Out[25]: {'T': [0, 0, 0, 1], 'G': [0, 0, 2, 1], 'C': [1, 1, 0, 0], 'A': [1, 1, 0, 0]}

Or if you want the result in same order as your character list, you can use collections.OrderedDict:

In [26]: from collections import OrderedDict

In [27]: OrderedDict((ch, [(ch==i) + (ch==j) for i, j in zip(a, b)]) for ch in chars)
Out[28]: OrderedDict([('A', [1, 1, 0, 0]), ('C', [1, 1, 0, 0]), ('G', [0, 0, 2, 1]), ('T', [0, 0, 0, 1])])

If you still need more performance and/or you're dealing with long strings and larger data sets you can use Numpy to get around this problem though a vectorized method.

In [61]: pairs = np.array((list(a), list(b))).T

In [62]: chars
Out[62]: 
array(['A', 'C', 'G', 'T'], 
      dtype='<U1')

In [63]: (chars[:,None,None] == pairs).sum(2)
Out[63]: 
array([[1, 1, 0, 0],
       [1, 1, 0, 0],
       [0, 0, 2, 1],
       [0, 0, 0, 1]])
like image 166
Mazdak Avatar answered Sep 20 '22 15:09

Mazdak


You can indeed not do assignments in list comprehension (well you can - by calling functions - perform side effects). A list comprehension expects an expression. Furthermore it is weird that you want to assign to count and at the same time update an old count.

A way to do this with dictionary comprehension and list comprehension that is not very efficient is:

chars = 'ACGT'

a = 'ACGG'
b = 'CAGT'

sequences = list(zip(a,b))

counts = {char:[seq.count(char) for seq in sequences] for char in chars}

(credits to @Chris_Rands for the seq.count(char) suggestion)

This produces:

{'G': [0, 0, 2, 1], 'A': [1, 1, 0, 0], 'C': [1, 1, 0, 0], 'T': [0, 0, 0, 1]}

You can easily generalize the solution to count more strings by calling zip(..) with more strings.

You can also decide to optimize your algorithm itself. This will probably be more effective since then you only have to loop over the strings once and you can use the lookup of a dictionary, like:

def CountWithPseudocounts(sequences):
    k = len(sequences[0])
    count = {char:[0]*k for char in 'ACGT'}
    for sequence in sequences:
        j = 0
        for symbol in sequence:
            count[symbol][j] += 1
            j += 1
    return count

EDIT:

If you want to add one to all elements in the counts you can use:

counts = {char:[seq.count(char)+1 for seq in sequences] for char in chars}
like image 36
Willem Van Onsem Avatar answered Sep 17 '22 15:09

Willem Van Onsem