These two questions concern using import
inside a function vs. at the top of a module. I do not need to be convinced to put my imports at the top, there are good reasons to do that. However, to better understand the technical issues I would like to ask a followup.
Can you get the best of both worlds performance-wise by using a closure and only importing on first run of the function?
Specifically, suppose you have code such as:
import sys
def get_version():
return sys.version
You want the import to only happen if the function ever gets called, so you move it inside:
def get_version():
import sys
return sys.version
But now it is slow if it does get called a lot, so you try something more complex:
def _get_version():
import sys
def nested():
return sys.version
global get_version
get_version = nested
return nested()
get_version = _get_version
Now at least a basic performace test indicates that this last option is slightly slower than the first (taking ~110% as long), but much faster than the second (taking ~20% as long).
First, does this actually work? Do my measurements accurately depict that the second example does more work or is it an artifact of how I measured things.
Second, is there a slowdown from the closure – beyond the first time the function is run?
Importing inside a function will effectively import the module once.. the first time the function is run. It ought to import just as fast whether you import it at the top, or when the function is run.
When a module is first imported, Python searches for the module and if found, it creates a module object 1, initializing it. If the named module cannot be found, a ModuleNotFoundError is raised. Python implements various strategies to search for the named module when the import machinery is invoked.
Globals in Python are global to a module, not across all modules. (Unlike C, where a global is the same across all implementation files unless you explicitly make it static.). If you need truly global variables from imported modules, you can set those at an attribute of the module where you're importing it.
Closure dereferencing is not any faster than global lookups:
>>> import sys
>>> sys.version_info
sys.version_info(major=3, minor=6, micro=0, releaselevel='final', serial=0)
>>> from timeit import timeit
>>> glob = 'foo'
>>> def f1(): return glob
...
>>> def closure():
... closed_over = 'bar'
... def f2():
... return closed_over
... return f2
...
>>> f2 = closure()
>>> timeit(f1, number=10**7)
0.8623221110319719
>>> timeit(f2, number=10**7)
0.872071701916866
In addition, even if it were faster, the tradeoff against readability are not worth it, certainly not when faster options are available for when you really need to optimise code.
Locals are the fastest option, always, if you really need to optimise code called from a tight loop, the proper hybrid is using function argument defaults:
import sys.version
def get_version(_sys_version=sys.version):
return _sys_version
If you are concerned with the impact of the initial file load from an import at startup time, perhaps you should look at the py-demandimport
project instead, which postpones loading modules until the first time they are used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With