Let's say you have a project that has several levels of folders going on and in various places, to make import calls cleaner, people have amended the PYTHONPATH for the whole project.
This means that instead of saying:
from folder1.folder2.folder3 import foo
they can now say
from folder3 import foo
and add folder1/folder2 to the PYTHONPATH. The question here is, if you keep this up, and have a large number of paths added to PYTHONPATH, does that have an appreciable or significant performance hit?
To add some sense of scale, in terms of performance, I'm asking in terms of milliseconds at a minimum (ie: 100 ms? 500 ms?)
So the performance trade-off between having a lot of different directories in your PYTHONPATH
and having deeply-nested package structures will be seen in the system calls. So assuming we have the following directory structures:
bash-3.2$ tree a
a
└── b
└── c
└── d
└── __init__.py
bash-3.2$ tree e
e
├── __init__.py
├── __init__.pyc
└── f
├── __init__.py
├── __init__.pyc
└── g
├── __init__.py
├── __init__.pyc
└── h
├── __init__.py
└── __init__.pyc
We can use these structures and the strace
program to compare and contrast the system calls that we generate for the following commands:
strace python -c 'from e.f.g import h'
PYTHONPATH="./a/b/c:$PYTHONPATH" strace python -c 'import d'
So the trade-off here is really system calls at start-up time, versus system calls at import time. For each entry in PYTHONPATH
, python
first checks to see if the directory exists:
stat("./a/b/c", {st_mode=S_IFDIR|0776, st_size=4096, ...}) = 0
stat("./a/b/c", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
If the directory exists (it does ... indicated by the 0 on the right), Python will search for a number of modules when the interpreter starts. For each module it checks:
stat("./a/b/c/site", 0x7ffd900baaf0) = -1 ENOENT (No such file or directory)
open("./a/b/c/site.x86_64-linux-gnu.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("./a/b/c/site.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("./a/b/c/sitemodule.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("./a/b/c/site.py", O_RDONLY) = -1 ENOENT (No such file or directory)
open("./a/b/c/site.pyc", O_RDONLY) = -1 ENOENT (No such file or directory)
Each of these fails, and it moves on to the next entry in the path searching for the module to order. My 3.5 intepretter looked up 25 modules this way, producing an incremental 152
system calls on start-up per new PYTHONPATH
entry.
The deep package structure pays no penalty on interpreter start-up, but when we import from the deeply nested package structure we do see a difference. As a baseline, here is the simple import of d/__init__.py
from the a/b/c
directory in our PYTHONPATH
:
stat("/home/matt/a/b/c/d", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
stat("/home/matt/a/b/c/d/__init__.py", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
stat("/home/matt/a/b/c/d/__init__", 0x7ffd900ba990) = -1 ENOENT (No such file or directory)
open("/home/matt/a/b/c/d/__init__.x86_64-linux-gnu.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/home/matt/a/b/c/d/__init__.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/home/matt/a/b/c/d/__init__module.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/home/matt/a/b/c/d/__init__.py", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
open("/home/matt/a/b/c/d/__init__.pyc", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0664, st_size=117, ...}) = 0
read(4, "\3\363\r\n\17\3105[c\0\0\0\0\0\0\0\0\1\0\0\0@\0\0\0s\4\0\0\0d\0"..., 4096) = 117
fstat(4, {st_mode=S_IFREG|0664, st_size=117, ...}) = 0
read(4, "", 4096) = 0
close(4) = 0
close(3) = 0
Basically what this is doing is looking for the d
package or module. When it finds d/__init__.py
it opens it, and then opens d/__init__.pyc
and reads the contents into memory before closing both files.
With our deeply nested package structure we have to repeat this operation 3 additional times, which is good for 15
system calls per directory for a total of 45 more system calls. While this is less than half the number of calls added by the addition of a path to our PYTHONPATH
, the read
calls could potentially be more time-consuming than other system calls (or require more system calls) depending on the size of the __init__.py
files.
Taking this all into consideration, these differences are almost certainly not material enough to outweigh the design benefits of your desired solution.
This is especially true if your processes are long-running (like a web-app) rather than being short-lived.
We can reduce the system calls by:
PYTHONPATH
entries .pyc
files to avoid needing to write themWe could more drastically improve performance by removing your py
files so they aren't read for debugging purposes along with your PYC files ... but this seems like a step too far to me.
Hope this is useful, it's probably a far deeper dive than is necessary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With