Note: the alleged duplicate question is, I think, mostly related to "<" and ">" comparison, but not "==" comparison and hence does not answer my question about performance of the "==" operator.
For a long time, I have believed that "processing" a sorted array should be faster than an unsorted array. At first, I thought using "==" in a sorted array should be faster than in unsorted array because - I guess - of how branch prediction works:
UNSORTEDARRAY:
5 == 100 F 43 == 100 F 100 == 100 T 250 == 100 F 6 == 100 F (other elements to check)
SORTEDARRAY:
5 == 100 F 6 == 100 F 43 == 100 F 100 == 100 T (no need to check other elements, so all are F)
so I guess SORTEDARRAY should be faster than UNSORTEDARRAY, but today I used the code to generate 2 arrays in a header to test, and branch prediction seemed not to work as I thought.
I generated an unsorted array and a sorted array to test:
srand(time(NULL)); int UNSORTEDARRAY[524288]; int SORTEDARRAY[sizeof(UNSORTEDARRAY)/sizeof(int)]; for(int i=0;i<sizeof(SORTEDARRAY)/sizeof(int);i++){ SORTEDARRAY[i]=UNSORTEDARRAY[i]=rand(); } sort(SORTEDARRAY,SORTEDARRAY+sizeof(SORTEDARRAY)/sizeof(int)); string u="const int UNSORTEDARRAY[]={"; string s="const int SORTEDARRAY[]={"; for(int i=0;i<sizeof(UNSORTEDARRAY)/sizeof(int);i++){ u+=to_string(UNSORTEDARRAY[i])+","; s+=to_string(SORTEDARRAY[i])+","; } u.erase(u.end()-1); s.erase(s.end()-1); u+="};\n"; s+="};\n"; ofstream out("number.h"); string code=u+s; out << code; out.close();
so to test, just count if the value is == RAND_MAX/2 like this:
#include "number.h" int main(){ int count; clock_t start = clock(); for(int i=0;i<sizeof(SORTEDARRAY)/sizeof(int);i++){ if(SORTEDARRAY[i]==RAND_MAX/2){ count++; } } printf("%f\n",(float)(clock()-start)/CLOCKS_PER_SEC); }
run 3 times:
UNSORTEDARRAY
0.005376 0.005239 0.005220
SORTEDARRAY
0.005334 0.005120 0.005223
it seems a small performance difference, so I didn't believe it and then tried to change "SORTEDARRAY[i]==RAND_MAX/2" to "SORTEDARRAY[i]>RAND_MAX/2" to see if it made a difference:
UNSORTEDARRAY
0.008407 0.008363 0.008606
SORTEDARRAY
0.005306 0.005227 0.005146
this time there is a great difference.
Is "==" in the sorted array not faster than an unsorted array? If yes, why is ">" in sorted array faster than an unsorted array but "==" is not?
In C++, it is faster to process a sorted array than an unsorted array because of branch prediction. In computer architecture, a branch prediction determines whether a conditional branch (jump) in the instruction flow of a program is likely to be taken or not. Branch prediction doesn't play a significant role here.
However, if k = Omega(log n) (i.e, k is bounded from below by log n ), then n log n = O(kn) , and sorting before searching is faster (asymptotically).
Which is the fastest search algorithm for unsorted array? If the array is not sorted then you have to search for the element by iteration ,linear search . There is no better way than O(n).
There are many well-known methods by which an array can be sorted, which include, but are not limited to: Selection sort, Bubble sort, Insertion sort, Merge sort, Quicksort, Heapsort, and Counting sort.
One thing that immediately comes to mind is CPU's branch prediction algorithm.
In case of >
comparison, in sorted array the branching behavior is very consistent: first, the if
condition is consistently false, then it is consistently true. This aligns very well with even the simplest branch prediction.
In unsorted array the result of >
condition is essentially random, thus thwarting any branch prediction.
This is what makes the sorted version faster.
In case of ==
comparison, most of the time the condition is false and only very rarely it is true. This works well with branch prediction regardless of whether the array is sorted or not. The timings are essentially the same.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With