How do I find out what parts of my code are inefficient in Python

Tags:

On a previous question I was asking about multiprocessing, using multiple cores to make a program run faster, and someone told me this:

More often than not, you can get a 100x+ optimization with better code compared to a 4x improvement and additional complexities with multiprocessing

They then recommended that I should:

Use a profiler to understand what is slow, then focus on optimizing that.

So I went to this question: How can you profile a script?

Here I found cProfile and implemented it into some testing code to see how it works.

This is my code:

import cProfile  def foo():     for i in range(10000):         a = i**i         if i % 1000 == 0:             print(i)  cProfile.run('foo()')

However, after running it, this was what I got:

0 1000 2000 3000 4000 5000 6000 7000 8000 9000          1018 function calls in 20.773 seconds     Ordered by: standard name     ncalls  tottime  percall  cumtime  percall filename:lineno(function)         1    0.000    0.000   20.773   20.773 <string>:1(<module>)       147    0.000    0.000    0.000    0.000 rpc.py:150(debug)        21    0.000    0.000    0.050    0.002 rpc.py:213(remotecall)        21    0.000    0.000    0.002    0.000 rpc.py:223(asynccall)        21    0.000    0.000    0.048    0.002 rpc.py:243(asyncreturn)        21    0.000    0.000    0.000    0.000 rpc.py:249(decoderesponse)        21    0.000    0.000    0.048    0.002 rpc.py:287(getresponse)        21    0.000    0.000    0.000    0.000 rpc.py:295(_proxify)        21    0.001    0.000    0.048    0.002 rpc.py:303(_getresponse)        21    0.000    0.000    0.000    0.000 rpc.py:325(newseq)        21    0.000    0.000    0.002    0.000 rpc.py:329(putmessage)        21    0.000    0.000    0.000    0.000 rpc.py:55(dumps)        20    0.000    0.000    0.001    0.000 rpc.py:556(__getattr__)         1    0.000    0.000    0.001    0.001 rpc.py:574(__getmethods)        20    0.000    0.000    0.000    0.000 rpc.py:598(__init__)        20    0.000    0.000    0.050    0.002 rpc.py:603(__call__)        20    0.000    0.000    0.051    0.003 run.py:340(write)         1   20.722   20.722   20.773   20.773 test.py:3(foo)        42    0.000    0.000    0.000    0.000 threading.py:1226(current_thread)        21    0.000    0.000    0.000    0.000 threading.py:215(__init__)        21    0.000    0.000    0.047    0.002 threading.py:263(wait)        21    0.000    0.000    0.000    0.000 threading.py:74(RLock)        21    0.000    0.000    0.000    0.000 {built-in method _struct.pack}        21    0.000    0.000    0.000    0.000 {built-in method _thread.allocate_lock}        42    0.000    0.000    0.000    0.000 {built-in method _thread.get_ident}         1    0.000    0.000   20.773   20.773 {built-in method builtins.exec}        42    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}        63    0.000    0.000    0.000    0.000 {built-in method builtins.len}        10    0.000    0.000    0.051    0.005 {built-in method builtins.print}        21    0.000    0.000    0.000    0.000 {built-in method select.select}        21    0.000    0.000    0.000    0.000 {method '_acquire_restore' of '_thread.RLock' objects}        21    0.000    0.000    0.000    0.000 {method '_is_owned' of '_thread.RLock' objects}        21    0.000    0.000    0.000    0.000 {method '_release_save' of '_thread.RLock' objects}        21    0.000    0.000    0.000    0.000 {method 'acquire' of '_thread.RLock' objects}        42    0.047    0.001    0.047    0.001 {method 'acquire' of '_thread.lock' objects}        21    0.000    0.000    0.000    0.000 {method 'append' of 'collections.deque' objects}         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}        21    0.000    0.000    0.000    0.000 {method 'dump' of '_pickle.Pickler' objects}        20    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}        21    0.000    0.000    0.000    0.000 {method 'getvalue' of '_io.BytesIO' objects}        21    0.000    0.000    0.000    0.000 {method 'release' of '_thread.RLock' objects}        21    0.001    0.000    0.001    0.000 {method 'send' of '_socket.socket' objects}

I was expecting it to show me what parts of my code were taking the longest, for example for it to show that a = i**i was taking the longest to compute, however all I can gather from what it has told me is that the foo() function took the longest, however what took the longest inside that function I have no idea from the data.

Also, when I implement this into my actual code, it does the same thing. Everything is in functions, and it only tells me what functions are taking the longest rather than what in the function is taking so long.

So here are my main questions:

How do I see what inside the function is making the code take so long (Should I even use cProfile?)
What is the best way to set about optimizing my code once I know what is using the most CPU

Note: My RAM and disk etc is absolutely fine, it is just the CPU which is maxed out (12% CPU because it only runs on a single core)

374

asked Aug 26 '17 09:08

RulerOfTheWorld

2 Answers

How do I see what inside the function is making the code take so long (Should I even use cProfile?)

Yes, you can use cProfile but the way you asked the question makes me wonder if line_profiler (third party module, you need to install it) wouldn't be a better tool.

I'm using the IPython/Jupyter bindings of this package when I want to profile a function:

%load_ext line_profiler

To actually profile a function:

%lprun -f foo foo() #             ^^^^^---- this call will be profiled #         ^^^-----------function to profile

Which produces this output:

Timer unit: 5.58547e-07 s  Total time: 17.1189 s File: <ipython-input-1-21b5a5f52f66> Function: foo at line 1  Line #      Hits         Time  Per Hit   % Time  Line Contents ==============================================================      1                                           def foo():      2     10001        31906      3.2      0.1      for i in range(10000):      3     10000     30534065   3053.4     99.6          a = i**i      4     10000        75998      7.6      0.2          if i % 1000 == 0:      5        10         6953    695.3      0.0              print(i)

That includes several things that might be interesting. For example 99.6% of the time is spent in the i**i line.

What is the best way to set about optimising my code once I know what is using the most CPU

That depends. Sometimes you need to use different functions/datastructures/algorithms - sometimes you can't do anything. But at least you know where your bottleneck is and you can estimate how much impact a change at the bottleneck or somewhere else would have.

129

answered Oct 05 '22 22:10

MSeifert

As you have noticed in the profiling log, the cProfile maximum resolution is the function.

So:

if your function is small you may be able to figure out which part is taking a long time (although it's sometimes difficult with built-in calls like in)
if your function is big, maybe it's time to cut it down to smaller functions, which become "profileable", however function call overhead may slow things down

answered Oct 05 '22 23:10

Jean-François Fabre

Related questions
                            
                                What can `__init__` do that `__new__` cannot?
                            
                                Disadvantages of using ASGI instead of WSGI [closed]
                            
                                Python Pandas: Does 'loc' and 'iloc' stand for anything?
                            
                                Extracting clusters from seaborn clustermap
                            
                                Python multiprocessing in flask
                            
                                Tensorflow: Can't understand ctc_beam_search_decoder() output sequence
                            
                                Where does Python root logger store a log?
                            
                                What does Python's builtin __build_class__ do?
                            
                                Python: always use __new__ instead of __init__?
                            
                                How do I abort object instance creation in Python?
                            
                                MySQL -- Joins Between Databases On Different Servers Using Python?
                            
                                How to remotely update Python applications
                            
                                Numpy AttributeError: 'float' object has no attribute 'exp'
                            
                                Scikit Learn: Logistic Regression model coefficients: Clarification
                            
                                Select network nodes with a given attribute value
                            
                                Python matplotlib.pyplot pie charts: How to remove the label on the left side?
                            
                                TensorFlow: Remember LSTM state for next batch (stateful LSTM)
                            
                                Python cloud hosting other than Google App Engine? [closed]
                            
                                Is it possible to use tox with conda-based Python installations?
                            
                                What is pip in Python? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I find out what parts of my code are inefficient in Python

Tags:

performance

python

python-3.x

profiling

processing-efficiency

RulerOfTheWorld

People also ask

2 Answers

MSeifert

Jean-François Fabre

Recent Activity

Donate For Us