Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Dictionary vs If Statement Speed

I have found a few links talking about switch cases being faster in c++ than if else because it can be optimized in compilation. I then found some suggestions people had that using a dictionary may be faster than an If statement. However, most of the conversation are about someones work end just end up discussing that they should optimize other parts of the code first and it wont matter unless your doing millions of if else. Can anyone explain why this is?

Say I have 100 unique numbers that are going to be streamed in to a python code constantly. I want to check which number it is, then execute something. So i could either do a ton of if else, or i could put each number in a dictionary. For arguments sake, lets say its a single thread.

Does someone understand the layer between python and the low level execution that can explain how this is working?

Thanks :)

like image 475
user-2147482637 Avatar asked Apr 10 '13 10:04

user-2147482637


People also ask

Are dictionaries faster Python?

The fastest way to repeatedly lookup data with millions of entries in Python is using dictionaries. Because dictionaries are the built-in mapping type in Python thereby they are highly optimized.

What is faster than if-else in Python?

yes , switch case works faster than if-else ladder , This is mainly because of the optimization of switch case. if-else need to be processed each line in order of how the user has defined it. For switch cases the compiler will use Branch table - Wikipedia which will provide faster execution.

Which is faster set or dictionary in Python?

Basis of Dictionaries and Sets Compared with lists and tuples, the performance of dictionaries is better, especially for search, add, and delete operations. A dictionary can be completed within a constant time complexity.

What we can use instead of if-else in Python?

Switch Case is a cleaner and faster alternative to if-else conditions in your code. Python does not directly support Switch Case but it does provide some very useful and efficient workarounds.


2 Answers

However, most of the conversation are about someones work end just end up discussing that they should optimize other parts of the code first and it wont matter unless your doing millions of if else. Can anyone explain why this is?

Generally, you should only bother to optimize code if you really need to, i.e. if the program's performance is unusably slow.

If this is the case, you should use a profiler to determine which parts are actually causing the most problems. For Python, the cProfile module is pretty good for this.

Does someone understand the layer between python and the low level execution that can explain how this is working?

If you want to get an idea of how your code executes, take a look at the dis module.

A quick example...

import dis

# Here are the things we might want to do
def do_something_a():
    print 'I did a'


def do_something_b():
    print 'I did b'


def do_something_c():
    print 'I did c'


# Case 1
def f1(x):
    if x == 1:
        do_something_a()
    elif x == 2:
        do_something_b()
    elif x == 3:
        do_something_c()


# Case 2
FUNC_MAP = {1: do_something_a, 2: do_something_b, 3: do_something_c}
def f2(x):
    FUNC_MAP[x]()


# Show how the functions execute
print 'Case 1'
dis.dis(f1)
print '\n\nCase 2'
dis.dis(f2)

...which outputs...

Case 1
 18           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               1 (1)
              6 COMPARE_OP               2 (==)
              9 POP_JUMP_IF_FALSE       22

 19          12 LOAD_GLOBAL              0 (do_something_a)
             15 CALL_FUNCTION            0
             18 POP_TOP
             19 JUMP_FORWARD            44 (to 66)

 20     >>   22 LOAD_FAST                0 (x)
             25 LOAD_CONST               2 (2)
             28 COMPARE_OP               2 (==)
             31 POP_JUMP_IF_FALSE       44

 21          34 LOAD_GLOBAL              1 (do_something_b)
             37 CALL_FUNCTION            0
             40 POP_TOP
             41 JUMP_FORWARD            22 (to 66)

 22     >>   44 LOAD_FAST                0 (x)
             47 LOAD_CONST               3 (3)
             50 COMPARE_OP               2 (==)
             53 POP_JUMP_IF_FALSE       66

 23          56 LOAD_GLOBAL              2 (do_something_c)
             59 CALL_FUNCTION            0
             62 POP_TOP
             63 JUMP_FORWARD             0 (to 66)
        >>   66 LOAD_CONST               0 (None)
             69 RETURN_VALUE


Case 2
 29           0 LOAD_GLOBAL              0 (FUNC_MAP)
              3 LOAD_FAST                0 (x)
              6 BINARY_SUBSCR
              7 CALL_FUNCTION            0
             10 POP_TOP
             11 LOAD_CONST               0 (None)
             14 RETURN_VALUE

...so it's pretty easy to see which function has to execute the most instructions.

As for which is actually faster, that's something you'd have to check by profiling the code.

like image 93
Aya Avatar answered Oct 19 '22 18:10

Aya


The if/elif/else structure compares the key it was given to a sequence of possible values one by one until it finds a match in the condition of some if statement, then reads what it is supposed to execute from inside the if block. This can take a long time, because so many checks (n/2 on average, for n possible values) have to be made for every lookup.

The reason that a sequence of if statements is more difficult to optimize than a switch statement is that the condition checks (what's inside the parens in C++) might conceivably change the state of some variable that's involved in the next check, so you have to do them in order. The restrictions on switch statements remove that possibility, so the order doesn't matter (I think).


Python dictionaries are implemented as hash tables. The idea is this: if you could deal with arbitrarily large numbers and had infinite RAM, you could create a huge array of function pointers that is indexed just by casting whatever your lookup value is to an integer and using that as the index. Lookup would be virtually instantaneous.

You can't do that, of course, but you can create an array of some manageable length, pass the lookup value to a hash function (which generates some integer, depending on the lookup value), then % your result with the length of your array to get an index within the bounds of that array. That way, lookup takes as much time as is needed to call the hash function once, take the modulus, and jump to an index. If the amount of different possible lookup values is large enough, the overhead of the hash function becomes negligible compared to those n/2 condition checks.

(Actually, since many different lookup values will inevitably map to the same index, it's not quite that simple. You have to check for and resolve possible conflicts, which can be done in a number of ways. Still, the gist of it is as described above.)

like image 8
alcedine Avatar answered Oct 19 '22 17:10

alcedine