Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparison: import statement vs __import__ function

As a followup to the question Using builtin __import__() in normal cases, I lead a few tests, and came across surprising results.

I am here comparing the execution time of a classical import statement, and a call to the __import__ built-in function. For this purpose, I use the following script in interactive mode:

import timeit   

def test(module):    
    t1 = timeit.timeit("import {}".format(module))
    t2 = timeit.timeit("{0} = __import__('{0}')".format(module))
    print("import statement:   ", t1)
    print("__import__ function:", t2)
    print("t(statement) {} t(function)".format("<" if t1 < t2 else ">"))

As in the linked question, here is the comparison when importing sys, along with some other standard modules:

>>> test('sys')
import statement:    0.319865173171288
__import__ function: 0.38428380458522987
t(statement) < t(function)

>>> test('math')
import statement:    0.10262547545597034
__import__ function: 0.16307580163101054
t(statement) < t(function)

>>> test('os')
import statement:    0.10251490255312312
__import__ function: 0.16240755669640627
t(statement) < t(function)

>>> test('threading')
import statement:    0.11349136644972191
__import__ function: 0.1673617034957573
t(statement) < t(function)

So far so good, import is faster than __import__(). This makes sense to me, because as I wrote in the linked post, I find it logical that the IMPORT_NAME instruction is optimized in comparison with CALL_FUNCTION, when the latter results in a call to __import__.

But when it comes to less standard modules, the results reverse:

>>> test('numpy')
import statement:    0.18907936340054476
__import__ function: 0.15840019037769792
t(statement) > t(function)

>>> test('tkinter')
import statement:    0.3798560809537861
__import__ function: 0.15899962771786136
t(statement) > t(function)

>>> test("pygame")
import statement:    0.6624641952621317
__import__ function: 0.16268579177259568
t(statement) > t(function)

What is the reason behind this difference in the execution times? What is the actual reason why the import statement is faster on standard modules? On the other hand, why is the __import__ function faster with other modules?

Tests lead with Python 3.6

like image 951
Right leg Avatar asked Sep 12 '17 12:09

Right leg


3 Answers

timeit measures the total execution time, but the first import of a module, whether through import or __import__, is slower than subsequent ones - because it's the only one that actually performs module initialization. It has to search the filesystem for the module's file(s), load the module's source code (slowest) or previously created bytecode (slow but a bit faster than parsing the .py files) or shared library (for C extensions), execute the initialization code, and store the module object in sys.modules. Subsequent imports get to skip all that and retrieve the module object from sys.modules.

If you reverse the order the results will be different:

import timeit   

def test(module):    
    t2 = timeit.timeit("{0} = __import__('{0}')".format(module))
    t1 = timeit.timeit("import {}".format(module))
    print("import statement:   ", t1)
    print("__import__ function:", t2)
    print("t(statement) {} t(function)".format("<" if t1 < t2 else ">"))

test('numpy')
import statement:    0.4611093703134608
__import__ function: 1.275512785926014
t(statement) < t(function)

The best way to get non-biased results is to import it once and then do the timings:

import timeit   

def test(module):    
    exec("import {}".format(module))
    t2 = timeit.timeit("{0} = __import__('{0}')".format(module))
    t1 = timeit.timeit("import {}".format(module))
    print("import statement:   ", t1)
    print("__import__ function:", t2)
    print("t(statement) {} t(function)".format("<" if t1 < t2 else ">"))

test('numpy')
import statement:    0.4826306561727307
__import__ function: 0.9192819125911029
t(statement) < t(function)

So, yes, import is always faster than __import__.

like image 110
MSeifert Avatar answered Oct 17 '22 23:10

MSeifert


Remember that all modules get cached into sys.modules after the first import, so the time...

Anyway, my results look like this:

#!/bin/bash

itest() {
    echo -n "import $1: "
    python3 -m timeit "import $1"
    echo -n "__import__('$1'): "
    python3 -m timeit "__import__('$1')"
}

itest "sys"
itest "math"
itest "six"
itest "PIL"
  • import sys: 0.481
  • __import__('sys'): 0.586
  • import math: 0.163
  • __import__('math'): 0.247
  • import six: 0.157
  • __import__('six'): 0.273
  • import PIL: 0.162
  • __import__('PIL'): 0.265

enter image description here

like image 26
AKX Avatar answered Oct 17 '22 22:10

AKX


What is the reason behind this difference in the execution times?

The import statement has a pretty straighforward path to go through. It leads to IMPORT_NAME which calls import_name and imports the given module (if no overriding of the name __import__ has been made):

dis('import math')
  1           0 LOAD_CONST               0 (0)
              2 LOAD_CONST               1 (None)
              4 IMPORT_NAME              0 (math)
              6 STORE_NAME               0 (math)
              8 LOAD_CONST               1 (None)
             10 RETURN_VALUE

__import__, on the other hand, goes through the generic function call steps that all functions do via CALL_FUNCTION:

dis('__import__(math)')
  1           0 LOAD_NAME                0 (__import__)
              2 LOAD_NAME                1 (math)
              4 CALL_FUNCTION            1
              6 RETURN_VALUE

Sure, it's builtin and so faster than normal py functions but it is still slower than the import statement with import_name.

This is why, the difference in time between them is constant. Using @MSeifert snippet (that corrected the unjust timings :-) and adding another print, you can see this:

import timeit   

def test(module):    
    exec("import {}".format(module))
    t2 = timeit.timeit("{0} = __import__('{0}')".format(module))
    t1 = timeit.timeit("import {}".format(module))
    print("import statement:   ", t1)
    print("__import__ function:", t2)
    print("t(statement) {} t(function)".format("<" if t1 < t2 else ">"))
    print('Diff: {}'.format(t2-t1))


for m in sys.builtin_module_names:
    test(m)

On my machine, there's a constant diff of around 0.17 between them (with slight variance that's generally expected)

*It is worth noting that these aren't exactly equivalent. __import__ doesn't do any name binding as the bytecode attests.

like image 3
Dimitris Fasarakis Hilliard Avatar answered Oct 17 '22 22:10

Dimitris Fasarakis Hilliard