Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clarification of statement of performance of collection's binary search from javadoc

I am confused on the performance analysis of binarySearch from the Collections

It says:

If the specified list does not implement the RandomAccess interface and is large, this method will do an iterator-based binary search that performs O(n) link traversals and O(log n) element comparisons.

I am not sure how to interpret this O(n) + O(log n).

I mean isn't it worse than simply traversing the linked-list and compare? We still get only O(n).

So what does this statement mean about performance? As phrased, I can not understand the difference from a plain linear search in the linked list.

What am I missunderstanding here?

like image 882
Cratylus Avatar asked Jan 24 '12 20:01

Cratylus


2 Answers

First of all you must understand that without RandomAccess interface the binarySearch cannot simply access, well, random element from the list, but instead it has to use an iterator. That introduces O(n) cost. When the collection implements RandomAccess, cost of each element access is O(1) and can be ignored as far as asymptotic complexity is concerned.

Because O(n) is greater than O(log n) it will always take precedence over O(log n) and dominate the complexity. In this case binarySearch has the same complexity as simple linear search. So what is the advantage?

Linear search performs O(n) comparisons, as opposed to O(log n) with binarySearch without random access. This is especially important when the constant before O(logn) is high. In plain English: when single comparison has a very high cost compared to advancing iterator. This might be quite common scenario, so limiting the number of comparisons is beneficial. Profit!

like image 102
Tomasz Nurkiewicz Avatar answered Oct 05 '22 10:10

Tomasz Nurkiewicz


Binary search is not suited for linked lists. The algorithm is supposed to benefit from a sorted collection with random access (like a plain array), where it can quickly jump from one element to another, splitting the remaining search space in two on each iteration (hence the O(log N) time complexity).

For a linked list, there is a modified version which iterates through all elements (and needs to go through 2n elements in the worst case), but instead of comparing every element, it "probes" the list at specified positions only (hence doing a lower number of comparisons compared to a linear search).

Since comparisons are usually slightly more costly compared to plain pointer iterating, total time should be lower. That is why the log N part is emphasized separately.

like image 20
Groo Avatar answered Oct 05 '22 10:10

Groo