Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest Python equivalent to switch for array of integers

I want to find the fastest way to do the job of switch in C. I'm writing some Python code to replace C code, and it's all working fine except for a bottleneck. This code is used in a tight loop, so it really is quite crucial that I get the best performance.

Optimsation Attempt 1: First attempt, as per previous questions such as this suggest using hash tables for lookups. This ended up being incredibly slow.

Optimsation Attempt 2 Another optimisation I have made is to create a run of if ... return statements which gives me a 13% speed boost. It's still disappointingly slow.

Optimsation Attempt 3 I created an array.array of all possible input values, and did an index lookup. This results in an over-all speed up of 43%, which is respectable.

I'm running over an array.array using map and passing a transform function to it. This function is doing the lookup. My switch is working on short integers (it's a typed array). If this were GCC C, the compiler would create a jump table. It's frustrating to know that Python is either hashing my value to lookup a table entry or in the case of if, performing lots of comparisons. I know from profiling it that the slow functions are precisely the ones that are doing the look-up.

What is the absolute fastest way of mapping one integer to another, mapped over an array.array if relevant. Anything faster than the above?

EDIT

Although it makes me look like an idiot for only just realising, I will say it anwyay! Remember that running your code in a profiler slows your code down a lot. In my case, 19 times slower. Suddenly my bottleneck isn't so bad! Thanks very much everyone for all your answers. The question is still valid. I'll leave the question open for a bit because there may be some interesting answers.

With profiler, for my test set of data:

real    0m37.309s
user    0m33.263s
sys     0m4.002s

without:

real    0m2.595s
user    0m2.526s
sys     0m0.028s
like image 765
Joe Avatar asked Oct 23 '22 16:10

Joe


2 Answers

I think others are right to suggest numpy or pure c; but for pure python, here are some timings, for what they're worth. Based on these, I'm a bit surprised that array.array performed so much better than a dict. Are you creating these tables on the fly inside the loop? Or have I misunderstood something else about your question? In any case, this suggests that a list is actually the best way to go.

>>> def make_lookup_func(table):
...     def lookup(val, t=table):
...         return t[val]
...     return lookup
... 
>>> lookup_tuple = make_lookup_func(tuple(range(10)))
>>> lookup_list = make_lookup_func(list(range(10)))
>>> lookup_array = make_lookup_func(array.array('i', range(10)))
>>> lookup_dict = make_lookup_func(dict(zip(range(10), range(10))))
>>> %timeit lookup_tuple(9)
10000000 loops, best of 3: 177 ns per loop
>>> %timeit lookup_list(9)
10000000 loops, best of 3: 158 ns per loop
>>> %timeit lookup_array(9)
10000000 loops, best of 3: 181 ns per loop
>>> %timeit lookup_dict(9)
10000000 loops, best of 3: 166 ns per loop

Scaling behavior:

>>> lookup_tuple = make_lookup_func(tuple(range(10000)))
>>> lookup_list = make_lookup_func(list(range(10000)))
>>> lookup_array = make_lookup_func(array.array('i', range(10000)))
>>> lookup_dict = make_lookup_func(dict(zip(range(10000), range(10000))))
>>> %timeit lookup_tuple(9000)
10000000 loops, best of 3: 177 ns per loop
>>> %timeit lookup_list(9000)
10000000 loops, best of 3: 158 ns per loop
>>> %timeit lookup_array(9000)
10000000 loops, best of 3: 186 ns per loop
>>> %timeit lookup_dict(9000)
10000000 loops, best of 3: 195 ns per loop
like image 54
senderle Avatar answered Oct 27 '22 09:10

senderle


Branch logic in general can be painfully slow in python when used in this type of application and you basically struck on one of the better ways of doing this for a tight inner loop where you are converting between integers. A few more things to experiment with:

You might try would be working with np.array or using Cython (or just straight C) for the tight loop. These require some additional setup (and possibly writing the inner loop in C), but can also give tremendous speedups for this type of application and can let you take advantage of a good C optimizer.

Something that can go either way and is more of a micro-optimization is that you could try using a list comprehension instead of a map, or make sure you aren't using a lambda in your map. Not using a lambda in a map() is actually a pretty big one, while the difference between a list comprehension and a map tends to be relatively small otherwise.

like image 34
David H. Clements Avatar answered Oct 27 '22 09:10

David H. Clements