I have created a number of personal libraries to help with my daily coding. Best practice is to put imports at the beginning of python programs. But say I import my library, or even just a function or class from the library. All of the modules are imported (even if those modules are used in other unused classes or functions). I assume this increases the overhead of the program?
One example. I have a library called pytools which looks something like this
import difflib
def foo():
# uses difflib.SequenceMatcher
def bar():
# benign function ie
print "Hello!"
return True
class foobar:
def __init__():
print "New foobar"
def ret_true():
return True
The function foo uses difflib. Now say I am writing a new program that needs to use bar and foobar. I could either write
import pytools
...
item = pytools.foobar()
vals = pytools.bar()
or I could do
from pytools import foobar, bar
...
item = foobar()
vals = bar()
Does either choice reduce overhead or preclude the import of foo and its dependencies on difflib? What if the import to difflib was inside of the foo function?
The problem I am running into is when converting simple programs into executables that only use one or two classes or functions from my libraries, The executable ends up being 50 mb or so.
I have read through py2exe's optimizing size page and can optimize using some of its suggestions.
http://www.py2exe.org/index.cgi/OptimizingSize
I guess I am really asking for best practice here. Is there some way to preclude the import of libraries whose dependencies are in unused functions or classes? I've watched import statements execute using a debugger and it appears that python only "picks up" the line with "def somefunction" before moving on. Is the rest of the import not completed until the function/class is used? This would mean putting high volume imports at the beginning of a function or class could reduce overhead for the rest of the library.
Startup and Module Importing Overhead. Starting a Python interpreter and importing Python modules is relatively slow if you care about milliseconds. If you need to start hundreds or thousands of Python processes as part of a workload, this overhead will amount to several seconds of overhead.
It's often useful to place them inside functions to restrict their visibility and/or reduce initial startup time. Although Python's interpreter is optimized to not import the same module multiple times, repeatedly executing an import statement can seriously affect performance in some circumstances.
The importlib package provides the implementation of the import statement in Python source code portable to any Python interpreter. This also provides an implementation which is easier to comprehend than one implemented in a programming language other than Python.
__loader__ is an attribute that is set on an imported module by its loader. Accessing it should return the loader object itself. In Python versions before 3.3, __loader__ was not set by the built-in import machinery. Instead, this attribute was only available on modules that were imported using a custom loader.
The only way to effectively reduce your dependencies is to split your tool box into smaller modules, and to only import the modules you need.
Putting imports at the beginning of unused functions will prevent loading these modules at run-time, but is discouraged because it hides the dependecies. Moreover, your Python-to-executable converter will likely need to include these modules anyway, since Python's dynamic nature makes it impossible to statically determine which functions are actually called.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With