Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python interpreter takes ~12 seconds to start up, all of which is spent in `import pyexpat`

I am using a Homebrew-installed Python on my Mac (running OS X 10.13.1), and of late, I’ve noticed that the interpreter takes a frustratingly long time to start up.

In setting out to try to solve this problem, I did a simple check with time:

PIPER-ALPHA:~$ time bpython -c 'pass'

real    0m12.141s
user    0m1.662s
sys     0m10.073s

… which revealed the egregiousness of the issue: 12 seconds!

I then used gnomon – an extremely handy npm module one uses for itemizing the timing of CLI tools – to winnow the issue down to the offending Python module. I used this command:

PIPER-ALPHA:~$ PYTHONVERBOSE=1 bpython -c 'pass' 2>&1 | tee -a /tmp/bpython-startup-messages | gnomon

… the gnomon output shows the time taken by each line issued by the verbose Python interpreter output. It looks like this:

verbose Python interpreter startup output timed with gnomon

… I have highlighted the line of output that took nearly twelve seconds to execute – by far the longest, as every single other line typically took a few nanoseconds, or at most a few μseconds, maybe.

Normally, if I encounter a wonky Python extension, I’ll recompile it myself, or otherwise tweak it from source, to properly render it unproblematic as necessary. But in this case, I am dealing with a c-extension module that is part of a larger Python standard-library module, all of which shipped with the Homebrew binary package (which is called a “bottle” in the Homebrew argot) that contained this version of Python.

Is this an issue anyone else can attest to? In particular, is it an issue anyone else sees when running Python under similar circumstances? And, most importantly, how might I fix it? Will I need to rebuild the entire Python installation, either using Homebrew or without it?

like image 685
fish2000 Avatar asked Oct 18 '22 02:10

fish2000


1 Answers

I have figured this out – the answer turns out to be simultaneously illuminating and embarrassing – and my solution may help others out when faced with similar circumstances.

In a nutshell: the aggravatingly long ~12s pauses I experienced while the Python interpreter loaded were being caused by having an excessively large amount of Python extension modules installed. It was not an issue with Python 2.7’s bundled xml.parsers.expat module, nor with its C-API pyexpat extension.

To wit: my use of the gnomon tool, which furnished what appeared to be direct and straightforward evidence pointing to these modules, ended up misleading me in my conclusions as to where the problematic code was to be found.

After posting my question, I did a bit of additional forensic poking-around. By varying the Python code I was passing to the interpreter while invoking the command-line speed checks, I found that the gnomon report would show the same twelve-plus-second halt, but at the incidence of different import statements. Further, I found that some command variants (e.g. those executed using the pythonpy CLT) weren’t plagued by the halting behavior at all.

I was able to pinpoint the lines of code responsible for the issues’ manifestation when I stumbled upon it by accident – while running my tests, the interminably long halts were no less annoying, and I ended up control-C’ing a number of tests in mid-halt. Those aborted test runs terminated with KeyboardInterrupt exceptions, and the accompanying stacktrace output revealed the function in which things were dragging:

exculpatory stacktrace

… the pkg_resources module, when imported, walks each of the extensions directories named in sys.path, enumerating each package in each extension, and subsequently reading in and then parsing all the associated metadata for all of those. Using any part of pkg_resources (which itself is part of the essential setuptools module) triggers this time-consuming action (which is then cached, at least, for the duration of that particular interpreter invocation’s lifetime). Depending on how your Python install is set up, and how you invoke your interpreter, you may or may not end up doing something to trigger the use of pkg_resources, but it is in pretty wide use across Python extension packages, so chances are good that it’ll get triggered by something.

The actual function responsible for the actual loop that actually enumerates the packages is _initialize_master_working_set() – it’s the one I’ve highlighted in the screenshot above. This is what all my KeyboardInterrupt stacktraces revealed. From there, it was immediately evident that the frustrating halts were a steeply linear function of the number of Cheese Shop packages present (something I had been reckless with after upgrading my laptop).

I immediately proceeded to pip-uninstall roughly 50% of the extensions I had gratuitously installed, and then pared down another 40% or so by hoisting most of my actively-developed Python stuff into isolated virtualenv project directories.

I felt pretty dumb afterward, as I had managed to cleverly mislead myself with my use of fancy analysis tools, and then found the actual solution by accident – one to a problem resulting from my own careless inattention, no less. Regardless, it is still something that could bite other Pythonic developers out there, and thus worth the writeup. You are hereby invited to learn from my circuitous adventures in issue-triage and diagnosis, indeed!

like image 134
fish2000 Avatar answered Oct 21 '22 09:10

fish2000