Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

global vs. local namespace performance difference

Why is it that executing a set of commands in a function:

def main():
    [do stuff]
    return something
print(main())

will tend to run 1.5x to 3x times faster in python than executing commands in the top level:

[do stuff]
print(something)
like image 286
user119264 Avatar asked Jan 11 '16 03:01

user119264


People also ask

What is the difference between global and local variable?

A global variable is a variable that is accessible globally. A local variable is one that is only accessible to the current scope, such as temporary variables used in a single function definition.

Does Python compute local variables quicker?

Locals should be faster When a line of code asks for the value of a variable x, Python will search for that variable in all the available namespaces, in order: local namespace - specific to the current function or class method.

Is it good to use global variables Python?

While in many or most other programming languages variables are treated as global if not declared otherwise, Python deals with variables the other way around. They are local, if not otherwise declared. The driving reason behind this approach is that global variables are generally bad practice and should be avoided.

What is the difference between nonlocal and global in Python?

An important difference between nonlocal and global is that the a nonlocal variable must have been already bound in the enclosing namespace (otherwise an syntaxError will be raised) while a global declaration in a local scope does not require the variable is pre-bound (it will create a new binding in the global ...


1 Answers

The difference does indeed greatly depend on what "do stuff" actually does and mainly on how many times it accesses names that are defined/used. Granted that the code is similar, there is a fundamental difference between these two cases:

  • In functions, the byte code for loading/storing names is done with LOAD_FAST/STORE_FAST.
  • In the top level scope (i.e module), the same commands are performed with LOAD_NAME/STORE_NAME which are more sluggish.

This can be viewed in the following cases, I'll be using a for loop to make sure that lookups for variables defined is performed multiple times.

Function and LOAD_FAST/STORE_FAST:

We define a simple function that does some really silly things:

def main():
    b = 20
    for i in range(1000000): z = 10 * b 
    return z

Output generated by dis.dis:

dis.dis(main)
# [/snipped output/]

             18 GET_ITER
        >>   19 FOR_ITER                16 (to 38)
             22 STORE_FAST               1 (i)
             25 LOAD_CONST               3 (10)
             28 LOAD_FAST                0 (b)
             31 BINARY_MULTIPLY
             32 STORE_FAST               2 (z)
             35 JUMP_ABSOLUTE           19
        >>   38 POP_BLOCK

# [/snipped output/]

The thing to note here is the LOAD_FAST/STORE_FAST commands at the offsets 28 and 32, these are used to access the b name used in the BINARY_MULTIPLY operation and store the z name, respectively. As their byte code name implies, they are the fast version of the LOAD_*/STORE_* family.


Modules and LOAD_NAME/STORE_NAME:

Now, let's look at the output of dis for our module version of the previous function:

# compile the module
m = compile(open('main.py', 'r').read(), "main", "exec")

dis.dis(m)
# [/snipped output/]

             18 GET_ITER
        >>   19 FOR_ITER                16 (to 38)
             22 STORE_NAME               2 (i)
             25 LOAD_NAME                3 (z)
             28 LOAD_NAME                0 (b)
             31 BINARY_MULTIPLY
             32 STORE_NAME               3 (z)
             35 JUMP_ABSOLUTE           19
        >>   38 POP_BLOCK

# [/snipped output/]

Over here we have multiple calls to LOAD_NAME/STORE_NAME, which, as mentioned previously, are more sluggish commands to execute.

In this case, there is going to be a clear difference in execution time, mainly because Python must evaluate LOAD_NAME/STORE_NAME and LOAD_FAST/STORE_FAST multiple times (due to the for loop I added) and, as a result, the overhead introduced each time the code for each byte code is executed will accumulate.

Timing the execution 'as a module':

start_time = time.time()
b = 20 
for i in range(1000000): z = 10 *b
print(z)
print("Time: ", time.time() - start_time)
200
Time:  0.15162253379821777

Timing the execution as a function:

start_time = time.time()
print(main())
print("Time: ", time.time() - start_time)
200
Time:  0.08665871620178223 

If you time loops in a smaller range (for example for i in range(1000)) you'll notice that the 'module' version is faster. This happens because the overhead introduced by needing to call function main() is larger than that introduced by *_FAST vs *_NAME differences. So it's largely relative to the amount of work that is done.

So, the real culprit here, and the reason why this difference is evident, is the for loop used. You generally have 0 reason to ever put an intensive loop like that one at the top level of your script. Move it in a function and avoid using global variables, it is designed to be more efficient.


You can take a look at the code executed for each of the byte code. I'll link the source for the 3.5 version of Python here even though I'm pretty sure 2.7 doesn't differ much. Bytecode evaluation is done in Python/ceval.c specifically in function PyEval_EvalFrameEx:

  • LOAD_FAST source - STORE_FAST source
  • LOAD_NAME source - STORE_NAME source

As you'll see, the *_FAST bytecodes simply get the value stored/loaded using a fastlocals local symbol table contained inside frame objects.

like image 195
Dimitris Fasarakis Hilliard Avatar answered Oct 19 '22 22:10

Dimitris Fasarakis Hilliard