I have the same program written in both C++ and Java. For C++ I am using VS 2019 and for Java using Eclipse 2019-03.
Here is the program in C++.
#define InputSize 500000
int FindDuplicate::FindDuplicateNaive(int* input, int size)
{
int j;
for (int i = 0; i < size-1; i++)
{
for ( j= i+1; j < size; j++)
{
if (input[i] == input[j])
return input[i];
}
}
return -1;
}
int* FindDuplicate::CreateTestCase(int size)
{
int* output = new int[size];
int i;
for ( i= 0; i < size-1; i++)
{
output[i] = i + 1;
}
output[i] = i;
return output;
}
int main()
{
int* input= FindDuplicate::CreateTestCase(InputSize);
auto start = std::chrono::system_clock::now();//clock start
int output = FindDuplicate::FindDuplicateNaive(input, InputSize);
auto end = std::chrono::system_clock::now();//clock end
cout<<"Output is: "<<output<<endl;
std::chrono::duration<double> elapsed_seconds = end - start;
cout<< "elapsed time: " << elapsed_seconds.count() << "s\n";
}
Here is the Java program...
public class FindDuplicate {
public static int FindDuplicateNaive(int[] input) {
for (int i = 0; i < input.length - 1; i++) {
for (int j = i + 1; j < input.length; j++) {
if (input[i] == input[j])
return input[i];
}
}
return -1;
}
public static int[] CreateTestCase(int n) {
// 1, 2, 3, 4, 5, 1 = n = 6
int[] output = new int[n];
int i;
for (i = 0; i < n - 1; i++) {
output[i] = i + 1;
}
output[i] = i;
return output;
}
public static void main(String[] args)
{
//Here also args[0] is 5,00,000
int number = Integer.parseInt(args[0]);
int[] input = CreateTestCase(number);
long start = System.currentTimeMillis();
int output = FindDuplicateNaive(input);
long end = System.currentTimeMillis();
System.out.println("Total time taken is: " + (end - start) / 1000.0 + " secs");
System.out.println(output);
}
You will be shocked to know the time taken by the same program the same input in both c++ and Java.
In Java:
Toal time taken is: 41.876 secs
499999
In CPP:
After enabling the optimization and in release mode,
Output is: 499999
elapsed time: 64.0293s
Any thought on this, what could be the reason? Why Java is taking on 41.876 secs whereas CPP take 64.0293 sec??
Since vectorisation cannot take place easily, most of the time is spent in loop control.
Thanks to the use of #pragma GCC unroll N
on the inner loop, which helps investigating, loop unrolling provides an explanation of the OP's results.
I obtain these average results (console excluded from the timings):
gcc 8.3, -03, unroll 64 1.63s
gcc 8.3, -03, unroll 32 1.66s
gcc 8.3, -03, unroll 16 1.71s
gcc 8.3, -03, unroll 8 1.81s
gcc 8.3, -03, unroll 4 1.97s
gcc 8.3, -03, unroll 2 2.33s
gcc 8.3, -03, no unroll 3.06s
openjdk 10.0.2 1.93s
edit: these tests were run with InputSize=100'000 as in the original question (changed to 500'000 afterwards)
The main difference is loop unrolling.
Java unrolled the inner loop very cleverly, while GCC/clang/MSVC/ICC don't unroll it (this is a missed optimization of these compilers).
If you manually unroll the loop, you can speed it up to have the similar speed as the java version, something like this:
for ( j= i+1; j < size-3; j+=4)
{
if (input[i] == input[j])
return input[i];
if (input[i] == input[j+1])
return input[i];
if (input[i] == input[j+2])
return input[i];
if (input[i] == input[j+3])
return input[i];
}
for (; j < size; j++)
{
if (input[i] == input[j])
return input[i];
}
For proof, here's the inner loop of the java version (8x unroll):
0x00007f13a5113f60: mov 0x10(%rsi,%rdx,4),%ebx ;*iaload
; - FindDuplicate::FindDuplicateNaive@25 (line 6)
0x00007f13a5113f64: cmp %ebx,%ecx
0x00007f13a5113f66: je 0x7f13a5113fcb ;*if_icmpne
; - FindDuplicate::FindDuplicateNaive@26 (line 6)
0x00007f13a5113f68: movsxd %edx,%rdi
0x00007f13a5113f6b: mov 0x14(%rsi,%rdi,4),%ebx ;*iaload
; - FindDuplicate::FindDuplicateNaive@25 (line 6)
0x00007f13a5113f6f: cmp %ebx,%ecx
0x00007f13a5113f71: je 0x7f13a5113fc9 ;*if_icmpne
; - FindDuplicate::FindDuplicateNaive@26 (line 6)
0x00007f13a5113f73: mov 0x18(%rsi,%rdi,4),%ebx ;*iaload
; - FindDuplicate::FindDuplicateNaive@25 (line 6)
0x00007f13a5113f77: cmp %ebx,%ecx
0x00007f13a5113f79: je 0x7f13a5113fed ;*if_icmpne
; - FindDuplicate::FindDuplicateNaive@26 (line 6)
0x00007f13a5113f7b: mov 0x1c(%rsi,%rdi,4),%ebx ;*iaload
; - FindDuplicate::FindDuplicateNaive@25 (line 6)
0x00007f13a5113f7f: cmp %ebx,%ecx
0x00007f13a5113f81: je 0x7f13a5113ff2 ;*if_icmpne
; - FindDuplicate::FindDuplicateNaive@26 (line 6)
0x00007f13a5113f83: mov 0x20(%rsi,%rdi,4),%ebx ;*iaload
; - FindDuplicate::FindDuplicateNaive@25 (line 6)
0x00007f13a5113f87: cmp %ebx,%ecx
0x00007f13a5113f89: je 0x7f13a5113ff7 ;*if_icmpne
; - FindDuplicate::FindDuplicateNaive@26 (line 6)
0x00007f13a5113f8b: mov 0x24(%rsi,%rdi,4),%ebx ;*iaload
; - FindDuplicate::FindDuplicateNaive@25 (line 6)
0x00007f13a5113f8f: cmp %ebx,%ecx
0x00007f13a5113f91: je 0x7f13a5113ffc ;*if_icmpne
; - FindDuplicate::FindDuplicateNaive@26 (line 6)
0x00007f13a5113f93: mov 0x28(%rsi,%rdi,4),%ebx ;*iaload
; - FindDuplicate::FindDuplicateNaive@25 (line 6)
0x00007f13a5113f97: cmp %ebx,%ecx
0x00007f13a5113f99: je 0x7f13a5114001 ;*if_icmpne
; - FindDuplicate::FindDuplicateNaive@26 (line 6)
0x00007f13a5113f9b: mov 0x2c(%rsi,%rdi,4),%ebx ;*iaload
; - FindDuplicate::FindDuplicateNaive@25 (line 6)
0x00007f13a5113f9f: cmp %ebx,%ecx
0x00007f13a5113fa1: je 0x7f13a5114006 ;*if_icmpne
; - FindDuplicate::FindDuplicateNaive@26 (line 6)
0x00007f13a5113fa3: add $0x8,%edx ;*iinc
; - FindDuplicate::FindDuplicateNaive@33 (line 5)
0x00007f13a5113fa6: cmp %r8d,%edx
0x00007f13a5113fa9: jl 0x7f13a5113f60 ;*if_icmpge
; - FindDuplicate::FindDuplicateNaive@17 (line 5)
This is not a complete answer, I cannot explain why it is actually running faster in Java than C++; but i can explain a couple of things that hold your C++ version performance back. Please don't select this as correct answer in case someone does have an actual explanation for the total difference in performance.
This answer has been discussed on meta and agreed that leaving it as a partial answer temporarily is the best option.
First and most important, as others mentioned in the comments, Java code is already optimized when you test it, whereas in C++ you have to specify optimization level as a command line argument (form visual studio ide compile as release), and while this makes a lot of difference, in my tests Java is still on top (all results on bottom).
But i want to point out a major flaw in your test, that may seem unsignificant in this specific case, since it makes little difference when you look at the numbers, but is still important: Input-output operations add noticeable delays. For an accurate execution-time comparison you must exclude input-output operations from your timer in both languages. Although in this case it makes little difference, having one language perform both the function and the output while the timer is running, and the other perform just the function, makes your whole test biased and pointless.
To have it more equivalent to the Java version, change your c++ main to
int main()
{
int* input = FindDuplicate::CreateTestCase(InputSize);
int result;
auto start = std::chrono::system_clock::now(); //clock start
result = FindDuplicate::FindDuplicateNaive(input, InputSize);
auto end = std::chrono::system_clock::now(); //clock end
std::chrono::duration<double> elapsed_seconds = end - start;
cout << "Output is: " << result << endl;
cout << "elapsed time: " << elapsed_seconds.count() << "s\n";
}
Note that by default C++'s console I/O (iostream, cin/cout) is even slower than it could be, because syncronization with C's console I/O (stdio, scanf/printf) is enabled to let a program not do weird things if both cout and printf are used. Here you can read about cout's performance when synchronization is turned off. Not only you used I/O inside your timer constraints, you even used it in its worst possible performance mode.
Here are my results, which while still giving Java an edge, shows how much difference certain compile options and I/O manipulations can make in C++ (for a single cout 0.03s of difference on average by turning syncronization off is bigger than it looks). All values in seconds are the average for 10 tests.
1. Java print in timer 1.52s
2. Java 1.36s
3. C++ debug, cout in timer 11.78s
4. C++ debug 11.73s
5. C++ release, cout in timer 3.32s
6. C++ release cout syncronization off 3.29s
7. C++ release 3.26s
I want you to understand that out of all these tests the only ones of which a comparison would make sense is 1 with 6 and 2 with 7. All the others (3, 4, 5) would make for a biased comparison regardless of how many times you repeat the test.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With