The Problem Description:
I have this custom "checksum" function:
NORMALIZER = 0x10000
def get_checksum(part1, part2, salt="trailing"):
"""Returns a checksum of two strings."""
combined_string = part1 + part2 + " " + salt if part2 != "***" else part1
ords = [ord(x) for x in combined_string]
checksum = ords[0] # initial value
# TODO: document the logic behind the checksum calculations
iterator = zip(ords[1:], ords)
checksum += sum(x + 2 * y if counter % 2 else x * y
for counter, (x, y) in enumerate(iterator))
checksum %= NORMALIZER
return checksum
Which I want to test on both Python3.6 and PyPy performance-wise. I'd like to see if the function would perform better on PyPy, but I'm not completely sure, what is the most reliable and clean way to do it.
What I've tried and the Question:
Currently, I'm using timeit
for both:
$ python3.6 -mtimeit -s "from test import get_checksum" "get_checksum('test1' * 100000, 'test2' * 100000)"
10 loops, best of 3: 329 msec per loop
$ pypy -mtimeit -s "from test import get_checksum" "get_checksum('test1' * 100000, 'test2' * 100000)"
10 loops, best of 3: 104 msec per loop
My concern is I'm not absolutely sure if timeit
is the right tool for the job on PyPy
because of the potential JIT warmup overhead.
Plus, the PyPy itself reports the following before reporting the test results:
WARNING: timeit is a very unreliable tool. use perf or something else for real measurements
pypy -m pip install perf
pypy -m perf timeit -s 'from test import get_checksum' "get_checksum('test1' * 1000000, 'test2' * 1000000)"
What would be the best and most accurate approach to test the same exact function performance across these and potentially other Python implementations?
PyPy often runs faster than CPython because PyPy uses a just-in-time compiler. Most Python code runs well on PyPy except for code that depends on CPython extensions, which either does not work or incurs some overhead when run in PyPy.
PyPy works best with pure Python apps Numpy, for instance, works very well with PyPy now. But if you want maximum compatibility with C extensions, use CPython.
In this small synthetic benchmark, PyPy is roughly 94 times as fast as Python! For more serious benchmarks, you can take a look at the PyPy Speed Center, where the developers run nightly benchmarks with different executables.
Pypy is as fast as or faster than c/c++ in some applications/benchmarks. And with python (or interpreted langs in general) you gain a repl, a shorter write -> compile -> test loop, and generally speaking a higher rate of development.
You could increase the number of repetitions with the --repeat
parameter in order to improve timing accuracy. see:
https://docs.python.org/2/library/timeit.html
It is not entirely clear what you are trying to measure. "Performance" can mean a variety of things depending on your use-case.
--repeat
a lot like Haroldo_OK suggested. With enough repetitions, the time spent in other parts of your code would become progressively "insignificant".Of note, timeit
turns off garbage collection, so if you're looking for "real world" measurements, maybe you want to turn it back on (see the link for how to do it).
If you're trying to improve the speed, using a profiler like cProfile which is supported by both Python3.6 and pypy could help with isolating the code whose speed you want to measure?
I'm not actually answering your question, but I hope it helps :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With