I am confused on the performance analysis of binarySearch
from the Collections
It says:
If the specified list does not implement the RandomAccess interface and is large, this method will do an iterator-based binary search that performs O(n) link traversals and O(log n) element comparisons.
I am not sure how to interpret this O(n)
+ O(log n)
.
I mean isn't it worse than simply traversing the linked-list and compare? We still get only O(n)
.
So what does this statement mean about performance? As phrased, I can not understand the difference from a plain linear search in the linked list.
What am I missunderstanding here?
First of all you must understand that without RandomAccess
interface the binarySearch
cannot simply access, well, random element from the list, but instead it has to use an iterator. That introduces O(n)
cost. When the collection implements RandomAccess
, cost of each element access is O(1)
and can be ignored as far as asymptotic complexity is concerned.
Because O(n)
is greater than O(log n)
it will always take precedence over O(log n)
and dominate the complexity. In this case binarySearch
has the same complexity as simple linear search. So what is the advantage?
Linear search performs O(n)
comparisons, as opposed to O(log n)
with binarySearch
without random access. This is especially important when the constant before O(logn)
is high. In plain English: when single comparison has a very high cost compared to advancing iterator. This might be quite common scenario, so limiting the number of comparisons is beneficial. Profit!
Binary search is not suited for linked lists. The algorithm is supposed to benefit from a sorted collection with random access (like a plain array), where it can quickly jump from one element to another, splitting the remaining search space in two on each iteration (hence the O(log N)
time complexity).
For a linked list, there is a modified version which iterates through all elements (and needs to go through 2n
elements in the worst case), but instead of comparing every element, it "probes" the list at specified positions only (hence doing a lower number of comparisons compared to a linear search).
Since comparisons are usually slightly more costly compared to plain pointer iterating, total time should be lower. That is why the log N
part is emphasized separately.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With