Using two identical mergesort algorithms, I tested the execution speed of C++ (using Visual Studios C++ 2010 express) vs Java (using NetBeans 7.0). I conjectured that the C++ execution would be at least slightly faster, but testing revealed that the C++ execution was 4 - 10 times slower than the Java execution. I believe that I have set all the speed optimisations for C++, and I am publishing as a release rather than as a debug. Why is this speed discrepancy occurring?
public class PerformanceTest1
{
/**
* Sorts the array using a merge sort algorithm
* @param array The array to be sorted
* @return The sorted array
*/
public static void sort(double[] array)
{
if(array.length > 1)
{
int centre;
double[] left;
double[] right;
int arrayPointer = 0;
int leftPointer = 0;
int rightPointer = 0;
centre = (int)Math.floor((array.length) / 2.0);
left = new double[centre];
right = new double[array.length - centre];
System.arraycopy(array,0,left,0,left.length);
System.arraycopy(array,centre,right,0,right.length);
sort(left);
sort(right);
while((leftPointer < left.length) && (rightPointer < right.length))
{
if(left[leftPointer] <= right[rightPointer])
{
array[arrayPointer] = left[leftPointer];
leftPointer += 1;
}
else
{
array[arrayPointer] = right[rightPointer];
rightPointer += 1;
}
arrayPointer += 1;
}
if(leftPointer < left.length)
{
System.arraycopy(left,leftPointer,array,arrayPointer,array.length - arrayPointer);
}
else if(rightPointer < right.length)
{
System.arraycopy(right,rightPointer,array,arrayPointer,array.length - arrayPointer);
}
}
}
public static void main(String args[])
{
//Number of elements to sort
int arraySize = 1000000;
//Create the variables for timing
double start;
double end;
double duration;
//Build array
double[] data = new double[arraySize];
for(int i = 0;i < data.length;i += 1)
{
data[i] = Math.round(Math.random() * 10000);
}
//Run performance test
start = System.nanoTime();
sort(data);
end = System.nanoTime();
//Output performance results
duration = (end - start) / 1E9;
System.out.println("Duration: " + duration);
}
}
#include <iostream>
#include <windows.h>
using namespace std;
//Mergesort
void sort1(double *data,int size)
{
if(size > 1)
{
int centre;
double *left;
int leftSize;
double *right;
int rightSize;
int dataPointer = 0;
int leftPointer = 0;
int rightPointer = 0;
centre = (int)floor((size) / 2.0);
leftSize = centre;
left = new double[leftSize];
for(int i = 0;i < leftSize;i += 1)
{
left[i] = data[i];
}
rightSize = size - leftSize;
right = new double[rightSize];
for(int i = leftSize;i < size;i += 1)
{
right[i - leftSize] = data[i];
}
sort1(left,leftSize);
sort1(right,rightSize);
while((leftPointer < leftSize) && (rightPointer < rightSize))
{
if(left[leftPointer] <= right[rightPointer])
{
data[dataPointer] = left[leftPointer];
leftPointer += 1;
}
else
{
data[dataPointer] = right[rightPointer];
rightPointer += 1;
}
dataPointer += 1;
}
if(leftPointer < leftSize)
{
for(int i = dataPointer;i < size;i += 1)
{
data[i] = left[leftPointer++];
}
}
else if(rightPointer < rightSize)
{
for(int i = dataPointer;i < size;i += 1)
{
data[i] = right[rightPointer++];
}
}
delete left;
delete right;
}
}
void main()
{
//Number of elements to sort
int arraySize = 1000000;
//Create the variables for timing
LARGE_INTEGER start; //Starting time
LARGE_INTEGER end; //Ending time
LARGE_INTEGER freq; //Rate of time update
double duration; //end - start
QueryPerformanceFrequency(&freq); //Determinine the frequency of the performance counter (high precision system timer)
//Build array
double *temp2 = new double[arraySize];
QueryPerformanceCounter(&start);
srand((int)start.QuadPart);
for(int i = 0;i < arraySize;i += 1)
{
double randVal = rand() % 10000;
temp2[i] = randVal;
}
//Run performance test
QueryPerformanceCounter(&start);
sort1(temp2,arraySize);
QueryPerformanceCounter(&end);
delete temp2;
//Output performance test results
duration = (double)(end.QuadPart - start.QuadPart) / (double)(freq.QuadPart);
cout << "Duration: " << duration << endl;
//Dramatic pause
system("pause");
}
For 10000 elements, the C++ execution takes roughly 4 times the amount of time as the Java execution. For 100000 elements, the ratio is about 7:1. For 10000000 elements, the ratio is about 10:1. For over 10000000, the Java execution completes, but the C++ execution stalls, and I have to manually kill the process.
Whoa, even despite the abstraction layers, the Java version is faster than C! This is possible because Java Virtual Machines are not purely interpreters; they compile Java's platform independent bytecode to native code “just in time,” while it's being run.
As a rule of thumb, when you convert Java to C++, the code is about 3x slower. This doesn't make sense at first, until you consider that code written in Java is "tuned to" the way Java code tends to be written, which is not at all how anyone who works in C++ would structure C++ code.
I think there might be a mistake in the way you ran the program. When you hit F5 in Visual C++ Express, the program is running under debugger and it will be a LOT slower. In other versions of Visual C++ 2010 (e.g. Ultimate that I use), try hitting CTRL+F5 (i.e. Start without Debugging) or try running the executable file itself (in the Express) and you see the difference.
I run your program with only one modification on my machine (added delete[] left; delete[] right;
to get rid of memory leak; otherwise it would ran out of memory in 32 bits mode!). I have an i7 950. To be fair, I also passed the same array to the Arrays.sort() in Java and to the std::sort in C++. I used an array size of 10,000,000.
Here are the results (time in seconds):
Java code: 7.13 Java Arrays.sort: 0.93 32 bits C++ code: 3.57 C++ std::sort 0.81 64 bits C++ code: 2.77 C++ std::sort 0.76
So the C++ code is much faster and even the standard library, which is highly tuned for in both Java and C++, tends to show slight advantage for C++.
Edit: I just realized in your original test, you run the C++ code in the debug mode. You should switch to the Release mode AND run it outside the debugger (as I explained in my post) to get a fair result.
I don't program C++ professionally (or even unprofessionally:) but I notice that you are allocating a double on the heap (double *temp2 = new double[arraySize];). This is expensive compared to Java initialisation but more importantly, it constitutes a memory leak since you never delete it, this could explain why your C++ implementation stalls, it's basically run out of memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With