Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is "==" in sorted array not faster than unsorted array? [duplicate]

Note: the alleged duplicate question is, I think, mostly related to "<" and ">" comparison, but not "==" comparison and hence does not answer my question about performance of the "==" operator.

For a long time, I have believed that "processing" a sorted array should be faster than an unsorted array. At first, I thought using "==" in a sorted array should be faster than in unsorted array because - I guess - of how branch prediction works:

UNSORTEDARRAY:

5 == 100 F 43 == 100 F 100 == 100 T 250 == 100 F 6 == 100 F (other elements to check) 

SORTEDARRAY:

5 == 100 F 6 == 100 F 43 == 100 F 100 == 100 T (no need to check other elements, so all are F) 

so I guess SORTEDARRAY should be faster than UNSORTEDARRAY, but today I used the code to generate 2 arrays in a header to test, and branch prediction seemed not to work as I thought.

I generated an unsorted array and a sorted array to test:

srand(time(NULL)); int UNSORTEDARRAY[524288]; int SORTEDARRAY[sizeof(UNSORTEDARRAY)/sizeof(int)]; for(int i=0;i<sizeof(SORTEDARRAY)/sizeof(int);i++){     SORTEDARRAY[i]=UNSORTEDARRAY[i]=rand(); } sort(SORTEDARRAY,SORTEDARRAY+sizeof(SORTEDARRAY)/sizeof(int)); string u="const int UNSORTEDARRAY[]={"; string s="const int SORTEDARRAY[]={"; for(int i=0;i<sizeof(UNSORTEDARRAY)/sizeof(int);i++){     u+=to_string(UNSORTEDARRAY[i])+",";     s+=to_string(SORTEDARRAY[i])+","; } u.erase(u.end()-1); s.erase(s.end()-1); u+="};\n"; s+="};\n"; ofstream out("number.h"); string code=u+s; out << code; out.close(); 

so to test, just count if the value is == RAND_MAX/2 like this:

#include "number.h" int main(){ int count;     clock_t start = clock();     for(int i=0;i<sizeof(SORTEDARRAY)/sizeof(int);i++){         if(SORTEDARRAY[i]==RAND_MAX/2){             count++;         }     }     printf("%f\n",(float)(clock()-start)/CLOCKS_PER_SEC); } 

run 3 times:

UNSORTEDARRAY

0.005376 0.005239 0.005220 

SORTEDARRAY

0.005334 0.005120 0.005223 

it seems a small performance difference, so I didn't believe it and then tried to change "SORTEDARRAY[i]==RAND_MAX/2" to "SORTEDARRAY[i]>RAND_MAX/2" to see if it made a difference:

UNSORTEDARRAY

0.008407 0.008363 0.008606 

SORTEDARRAY

0.005306 0.005227 0.005146 

this time there is a great difference.

Is "==" in the sorted array not faster than an unsorted array? If yes, why is ">" in sorted array faster than an unsorted array but "==" is not?

like image 692
ggrr Avatar asked Aug 18 '15 03:08

ggrr


People also ask

Why is it faster to process a sorted array than an unsorted array?

In C++, it is faster to process a sorted array than an unsorted array because of branch prediction. In computer architecture, a branch prediction determines whether a conditional branch (jump) in the instruction flow of a program is likely to be taken or not. Branch prediction doesn't play a significant role here.

Is it faster to sort an array before searching?

However, if k = Omega(log n) (i.e, k is bounded from below by log n ), then n log n = O(kn) , and sorting before searching is faster (asymptotically).

Which is the fastest search algorithm for unsorted array?

Which is the fastest search algorithm for unsorted array? If the array is not sorted then you have to search for the element by iteration ,linear search . There is no better way than O(n).

In which process data within the array is sorted?

There are many well-known methods by which an array can be sorted, which include, but are not limited to: Selection sort, Bubble sort, Insertion sort, Merge sort, Quicksort, Heapsort, and Counting sort.


1 Answers

One thing that immediately comes to mind is CPU's branch prediction algorithm.

In case of > comparison, in sorted array the branching behavior is very consistent: first, the if condition is consistently false, then it is consistently true. This aligns very well with even the simplest branch prediction.

In unsorted array the result of > condition is essentially random, thus thwarting any branch prediction.

This is what makes the sorted version faster.

In case of == comparison, most of the time the condition is false and only very rarely it is true. This works well with branch prediction regardless of whether the array is sorted or not. The timings are essentially the same.

like image 109
AnT Avatar answered Sep 24 '22 18:09

AnT