Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Time complexity when sorting is done before binary searching...please see

Suppose there is an array containing unsorted data and I need to choose either linear search or binary search for searching. Then which option should I choose? The time complexity for linear search is O(n) and for binary search is O(log n). But, the fastest sorting algorithm gives the time complexity of O(n * log n). Now, I don't know how to "add" complexities of two algorithms (if that's the right word) and hence, I am asking this question.

So my question is if sorting then binary searching is better than simply linear searching or is it the other way?

Plus, how do I prove whatever the case maybe using big O notation ( I mean "adding" and "comparing" the time complexities) ?

Thank you so much for reading!!! It means a lot.

like image 446
finitenessofinfinity Avatar asked Feb 11 '13 01:02

finitenessofinfinity


People also ask

What is the time complexity for a binary search?

The time complexity of the binary search algorithm is O(log n). The best-case time complexity would be O(1) when the central index would directly match the desired value.

What is the time complexity for binary search if the list is not sorted?

The complexity is O(logn). Binary Search does not work for "un-Sorted" lists. For these lists just do a straight search starting from the first element; this gives a complexity of O(n). If you were to sort the array with MergeSort or any other O(nlogn) algorithm then the complexity would be O(nlogn).

Is it necessary to sort the data before the binary search?

You only have to do the sort once, and then binary search the resulting data set multiple times. The point of the search is not to generate a sorted array. It is to locate a specific value. The search requires a sorted array.


2 Answers

You don't really "add" the complexities. Sorting is, as you say, O(n * log n), and searching is O(log n). If you were to do "normal math" on them, then it would be (n+1)*log n, which is still n*log n.

When you're performing multiple steps like that, you typically take the highest complexity and call it that. After all, when n is sufficiently large, n*log n dwarfs log n.

Think of it this way: when n is 1,000,000, n*log n is 20 million. log n is 20. So what's the difference between 20,000,000 and 20,000,020? The (log n) term is irrelevant. So (n log n) + (log n) is, for all intents and purposes, equal to (n log n). Even when n is 100, log n is 7. The (log n) term just won't make a difference when n is even moderately large.

In your particular case, if you only need to search the list one time, then sequential search is the way to go. If you need to search it multiple times, then you have to weigh the cost of m searches O(m * n) against the cost of sorting and then searching. If you're interested in the minimum time and you know how many times you'll be searching the list, then you'd use sequential search if (m*n) is less than (n * log n). Otherwise use the sort and then binary search.

But that's not the only consideration. Binary search on a sorted list gives you very quick response time, whereas linear search can take a very long time for a single item. If you can afford to sort the list during program startup then that's probably the best way to go because items will be found (or not found) much faster once the program is operating. Sorting the list gives you better response time. It's better to pay the price of sorting during startup than to experience very unpredictable response times during operation. Or to find out that you need to do more searches than you thought. . .

like image 158
Jim Mischel Avatar answered Sep 18 '22 13:09

Jim Mischel


If you have to do one search, do linear search. It's obviously better than sorting and then binary search.
But if you have multiple search queries, you in most cases should first sort the array, and then apply a binary search to every query.
Why ? Let's say you're going to perform O(k) search queries. If you do a linear search, you'll end up with O(n*k) operations. If you first sort, that will take O(nlogn) + O(klogn) = O((n+k)logn) operations. What is better ? When k is very small (less than logn), it's better to do linear search. However in most cases you'd better to sort first.

like image 42
Grigor Gevorgyan Avatar answered Sep 18 '22 13:09

Grigor Gevorgyan