Let's say I'm writing my own StringBuilder
in a compiled language (e.g. C++).
What is the best way to measure the performance of various implementations? Simply timing a few hundred thousand runs yields highly inconsistent results: the timings from one batch to the other can differ by as much as 15%, making it impossible to accurately assess potential performance improvements that yield performance gains smaller than that.
I've done the following:
This stabilizied the results somewhat. Any other ideas?
I have achieved 100% consistent results in this manner:
cli
/ sti
instructions (note that the binary won't run on modern OSes after this change).rdtsc
deltas for timing. The samples should be within the cli
…sti
instructions.The result seems to be completely deterministic, but is not an accurate assessment of overall performance (see the discussion under Osman Turan's answer for details).
As a bonus tip, here's an easy way to share files with Bochs (so you don't have to unmount/rebuild/remount the floppy image every time):
On Windows, Bochs will lock the floppy image file, but the file is still opened in shared-write mode. This means that you can't overwrite the file, but you can write to it. (I think *nix OSes might cause overwriting to create a new file, as far as file descriptors are concerned.) The trick is to use dd
. I had the following batch script set up:
... benchmark build commands here ...
copy /Y C:\Path\To\Benchmark\Project\test2dos.exe floppy\test2.exe
bfi -t=288 -f=floppysrc.img floppy
dd if=floppysrc.img of=floppy.img
bfi
is Bart's Build Floppy Image.
Then, just mount floppy.img
in Bochs.
Bonus tip #2: To avoid having to manually start the benchmark every time in Bochs, put an empty go.txt
file in the floppy directory, and run this batch in Bochs:
@echo off
A:
:loop
choice /T:y,1 > nul
if not exist go.txt goto loop
del go.txt
echo ---------------------------------------------------
test2
goto loop
It will start the test program every time it detects a fresh floppy image. This way, you can automate a benchmark run in a single script.
Update: this method is not very reliable. Sometimes the timings would change as much as by 200% just by reordering some tests (these timing changes were not observed when ran on real hardware, using the methods described in the original question).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With