I'm trying to determine how to correctly use cProfile and pstats with asyncio code. I am profiling my script by running cProfile.run('loop.run_until_complete(main())', 'profile.stats')
.
After using pstats
to sort by SortKeys.TIME
, I get the following output:
In [9]: sorted_stats.print_stats()
Fri Feb 15 14:40:18 2019 profile.txt
67411199 function calls (67230882 primitive calls) in 305.899 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
3617 273.162 0.076 273.162 0.076 {method 'poll' of 'select.epoll' objects}
14094092/14093931 1.527 0.000 1.904 0.000 {built-in method builtins.isinstance}
44901 1.166 0.000 1.166 0.000 {built-in method posix.stat}
784052 1.028 0.000 1.479 0.000 /usr/lib/python3/dist-packages/soupsieve/util.py:41(lower)
32070 0.887 0.000 1.565 0.000 /usr/lib/python3/dist-packages/bs4/element.py:620(<genexpr>)
4554545 0.729 0.000 0.756 0.000 /usr/lib/python3/dist-packages/bs4/element.py:1331(descendants)
1026560 0.726 0.000 3.358 0.000 /usr/lib/python3/dist-packages/bs4/element.py:1502(search)
448385 0.716 0.000 1.411 0.000 /usr/lib/python3/dist-packages/bs4/element.py:1528(_matches)
104 0.689 0.007 4.771 0.046 /usr/lib/python3/dist-packages/bs4/builder/_lxml.py:283(feed)
255143 0.655 0.000 1.575 0.000 /usr/lib/python3/dist-packages/bs4/__init__.py:391(endData)
137377 0.520 0.000 3.692 0.000 /usr/lib/python3/dist-packages/html5lib/_tokenizer.py:49(__iter__)
8574 0.510 0.000 2.392 0.000 /usr/lib/python3.7/traceback.py:318(extract)
167700 0.494 0.000 1.061 0.000 /usr/lib/python3/dist-packages/bleach/html5lib_shim.py:199(charsUntil)
53607 0.487 0.000 0.589 0.000 /usr/lib/python3/dist-packages/bs4/builder/__init__.py:158(_replace_cdata_list_attribute_values)
299357 0.485 0.000 0.485 0.000 /usr/lib/python3/dist-packages/bs4/element.py:258(setup)
22253 0.467 0.000 5.928 0.000 /usr/lib/python3/dist-packages/bs4/element.py:592(_find_all)
448385 0.437 0.000 2.225 0.000 /usr/lib/python3/dist-packages/bs4/element.py:1461(search_tag)
83147 0.424 0.000 2.365 0.000 /usr/lib/python3/dist-packages/bs4/builder/_lxml.py:176(start)
1293015 0.418 0.000 0.556 0.000 /usr/lib/python3/dist-packages/soupsieve/css_match.py:94(is_tag)
4824419 0.409 0.000 0.409 0.000 {method 'append' of 'list' objects}
73068 0.404 0.000 0.634 0.000 /home/jess/.local/lib/python3.7/site-packages/rfc3986/normalizers.py:127(encode_component)
868978 0.403 0.000 0.773 0.000 /usr/lib/python3/dist-packages/bleach/html5lib_shim.py:192(char)
454702 0.385 0.000 0.385 0.000 {method 'match' of 're.Pattern' objects}
2326981 0.374 0.000 0.415 0.000 {built-in method builtins.hasattr}
52 0.334 0.006 5.732 0.110 /usr/lib/python3/dist-packages/html5lib/html5parser.py:196(mainLoop)
... ... ... ... ... ...
As you can see, almost all of the execution time is showing up as {method 'poll' of 'select.epoll' objects}
.
I don't have enough experience with cProfile
or asyncio
to know how to interpret this result. I can't tell if stats are showing my program spending the majority of execution time in poll()
because (a) cProfile doesn't work well with asyncio
and isn't giving correct stats, (b) this is normal behavior, and I should just ignore the first line or (c) this is an indicator of some sort of bug in my program (which is indeed terribly slow right now for reasons I have yet to determine) ...
Is it normal for async processes to spend this much execution time in poll()
? Is there a better way to profile async code, either with cProfile or something else, that will give me more accurate results? Or is this a sign that there is likely some sort of bug in my code causing this?
NOTE: for context, the code I was profiling is basically fetching 50 pages, processing the HTML, and then doing some text/language processing. The fetch()
part could explain a good portion of the time in poll()
. But this on a fast internet connection (> 5MB /sec) so I don't think it should be spending that % of the time fetching just a few dozen text/html
pages. It seems like something else is going on, or the stats aren't correct ...
TL;DR use better tools
I found that it's more practical to profile something when it's reproducible, because then, any code change can be validated (if it improves performance or not).
If you can convert your code into a test case, using pytest
, then please consider using https://pypi.org/project/pytest-profiling/, it's easy to set up, and the visual output is [relatively] easy to understand.
Another option is to use a statistical profiler, like https://pypi.org/project/pprofile/
Yet another, if you fancy profiling in production, is to use a service, like https://docs.datadoghq.com/tracing/profiler/
Finally, be clear about what you measure.
Most/many profiler allow to collect both wall clock time and cpu use time.
A naïve asyncio
program is single-threaded, and given enough CPU share, is in one of two states at any given time: either crunching, or waiting. If your "problem" is too much crunching, then profiler is the answer, and, often, cpu time and wall time are roughly same. If the "problem" is actually waiting, then you probably want to look at "who is waiting for what". Shameless plug: use my https://pypi.org/project/awaitwhat/ for Python 3.7~3.9.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With