I have a multithreaded mergesorting program in C, and a program for benchmark testing it with 0, 1, 2, or 4 threads. I also wrote a program in Python to do multiple tests and aggregate the results.
The weird thing is that when I run the Python, the tests always run in about half the time compared to when I run them directly in the shell.
For example, when I run the testing program by itself with 4 million integers to sort (the last two arguments are the seed and modulus for generating integers):
$ ./mergetest 4000000 4194819 140810581084
0 threads: 1.483485s wall; 1.476092s user; 0.004001s sys
1 threads: 1.489206s wall; 1.488093s user; 0.000000s sys
2 threads: 0.854119s wall; 1.608100s user; 0.008000s sys
4 threads: 0.673286s wall; 2.224139s user; 0.024002s sys
Using the python script:
$ ./mergedata.py 1 4000000
Average runtime for 1 runs with 4000000 items each:
0 threads: 0.677512s wall; 0.664041s user; 0.016001s sys
1 threads: 0.709118s wall; 0.704044s user; 0.004001s sys
2 threads: 0.414058s wall; 0.752047s user; 0.028001s sys
4 threads: 0.373708s wall; 1.24008s user; 0.024002s sys
This happens no matter how many I'm sorting, or how many times I run it. The python program calls the tester with the subprocess module, then parses and aggregates the output. Any ideas why this would happen? Is Python somehow optimizing the execution? or is there something slowing it down when I run it directly that I'm not aware of?
Code: https://gist.github.com/2650009
Turns out I was passing sys.maxint to the subprocess as the modulus for generating random numbers. C was truncating the 64-bit integer and interpreting it as signed, i.e., -1 in two's complement, so every random number was being mod'd by that and becoming 0. So, sorting all the same values seems to take about half as much time as random data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With