Usually I use shell command time
. My purpose is to test if data is small, medium, large or very large set, how much time and memory usage will be.
Any tools for Linux or just Python to do this?
Python module for benchmarkingIn Python, we have a by default module for benchmarking which is called timeit. With the help of the timeit module, we can measure the performance of small bit of Python code within our main program.
the main idea behind benchmarking or profiling is to figure out how fast your code executes and where the bottlenecks are. the main reason to do this sort of thing is for optimization. you will run into situations where you need your code to run faster because your business needs have changed.
Steps for benchmarking code using BenchmarkDotNetAdd the necessary NuGet package. Add Benchmark attributes to your methods. Create a BenchmarkRunner instance. Run the application in Release mode.
To run the benchmarks you simply use pytest to run your “tests”. The plugin will automatically do the benchmarking and generate a result table. Run pytest --help for more details.
Have a look at timeit, the python profiler and pycallgraph. Also make sure to have a look at the comment below by nikicc
mentioning "SnakeViz". It gives you yet another visualisation of profiling data which can be helpful.
def test(): """Stupid test function""" lst = [] for i in range(100): lst.append(i) if __name__ == '__main__': import timeit print(timeit.timeit("test()", setup="from __main__ import test")) # For Python>=3.5 one can also write: print(timeit.timeit("test()", globals=locals()))
Essentially, you can pass it python code as a string parameter, and it will run in the specified amount of times and prints the execution time. The important bits from the docs:
timeit.timeit(stmt='pass', setup='pass', timer=<default timer>, number=1000000, globals=None)
Create aTimer
instance with the given statement, setup code and timer function and run itstimeit
method with number executions. The optional globals argument specifies a namespace in which to execute the code.
... and:
Timer.timeit(number=1000000)
Time number executions of the main statement. This executes the setup statement once, and then returns the time it takes to execute the main statement a number of times, measured in seconds as a float. The argument is the number of times through the loop, defaulting to one million. The main statement, the setup statement and the timer function to be used are passed to the constructor.Note: By default,
timeit
temporarily turns offgarbage collection
during the timing. The advantage of this approach is that it makes independent timings more comparable. This disadvantage is that GC may be an important component of the performance of the function being measured. If so, GC can be re-enabled as the first statement in the setup string. For example:
timeit.Timer('for i in xrange(10): oct(i)', 'gc.enable()').timeit()
Profiling will give you a much more detailed idea about what's going on. Here's the "instant example" from the official docs:
import cProfile import re cProfile.run('re.compile("foo|bar")')
Which will give you:
197 function calls (192 primitive calls) in 0.002 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.001 0.001 <string>:1(<module>) 1 0.000 0.000 0.001 0.001 re.py:212(compile) 1 0.000 0.000 0.001 0.001 re.py:268(_compile) 1 0.000 0.000 0.000 0.000 sre_compile.py:172(_compile_charset) 1 0.000 0.000 0.000 0.000 sre_compile.py:201(_optimize_charset) 4 0.000 0.000 0.000 0.000 sre_compile.py:25(_identityfunction) 3/1 0.000 0.000 0.000 0.000 sre_compile.py:33(_compile)
Both of these modules should give you an idea about where to look for bottlenecks.
Also, to get to grips with the output of profile
, have a look at this post
NOTE pycallgraph has been officially abandoned since Feb. 2018. As of Dec. 2020 it was still working on Python 3.6 though. As long as there are no core changes in how python exposes the profiling API it should remain a helpful tool though.
This module uses graphviz to create callgraphs like the following:
You can easily see which paths used up the most time by colour. You can either create them using the pycallgraph API, or using a packaged script:
pycallgraph graphviz -- ./mypythonscript.py
The overhead is quite considerable though. So for already long-running processes, creating the graph can take some time.
I use a simple decorator to time the func
import time def st_time(func): """ st decorator to calculate the total time of a func """ def st_func(*args, **keyArgs): t1 = time.time() r = func(*args, **keyArgs) t2 = time.time() print("Function=%s, Time=%s" % (func.__name__, t2 - t1)) return r return st_func
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With