Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to iterate over multiple list comprehensions

I have the following code:

def func(value, start=None, end=None):
    if start is not None and start != 0:
        start = -start
    elif start == 0:
        start = None
    if end is not None:
        end = -end - 1
    return int('{:032b}'.format(value)[end:start], 2)

data = np.random.randint(1, 429496729, 10000)
starts = [10, 50, 100, 200]
stops = [30, 90, 170, 250]
data_dict = [{} for _ in range(len(starts))]

for ii, (start, stop) in enumerate(zip(starts, stops)):
    range_array = np.arange(start, stop, 2)
    data_dict[ii]['one'] = [func(value, 0, 8) for value in data[range_array]]
    data_dict[ii]['two'] = [func(value, 9, 17) for value in data[range_array]]
    data_dict[ii]['three'] = [func(value, 27, 27) for value in data[range_array]]
    data_dict[ii]['four'] = [func(value, 28, 28) for value in data[range_array]]

The problem is that this code runs through relatively slowly. However, all other approaches I have tried so far are even slower. Does anyone have an idea how to rewrite this code so that it runs through faster?

like image 880
Konstantin Fehler Avatar asked Oct 19 '25 05:10

Konstantin Fehler


1 Answers

You can use numpy broadcasting to vectorize the bitmasking with logical and & and shifting >>.

import numpy as np

np.random.seed(100)
data = np.random.randint(1, 429496729, 10000)
starts = [10, 50, 100, 200]
stops = [30, 90, 170, 250]

# equal to 'start' from calling func(value, start, end)
shift = np.array([0,9,27,28])[:, None]

# equal to 'end - start + 1' from calling func(value, start, end)
bitmask = np.array([9,9,1,1])[:, None]
  
d = [data[start:stop:2] >> shift & (2**bitmask - 1) for start, stop in zip(starts, stops)]

To access the result list d

d[0]

Output

array([[ 54, 227, 291, 281, 229,  59, 508,  87, 365, 416],
       [ 40, 207, 353, 168, 214, 271, 338, 268, 419,  52],
       [  1,   0,   0,   0,   0,   0,   1,   1,   0,   0],
       [  0,   1,   1,   1,   0,   0,   0,   1,   1,   0]])

And access similar to your dictionarys

one, two, three, four = np.arange(4)
d[1][two]

Output

array([ 68, 479, 230, 295, 278, 455, 276,  45, 360, 488, 241, 336, 447,
       316, 181,  94, 138, 404, 223, 310])

To get the result exactly like the original solution:

actual = [
    {
        name: x[index].tolist()
        for index, name
        in enumerate(["one","two","three","four"])
    }
    for x in d
]

This produces the exact result and maintains an order of magnitude boost in performance.

like image 89
5 revs, 2 users 83%Michael Szczesny Avatar answered Oct 21 '25 20:10

5 revs, 2 users 83%Michael Szczesny



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!