Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How fast can you make linear search?

I'm looking to optimize this linear search:

static int linear (const int *arr, int n, int key) {         int i = 0;         while (i < n) {                 if (arr [i] >= key)                         break;                 ++i;         }         return i; } 

The array is sorted and the function is supposed to return the index of the first element that is greater or equal to the key. They array is not large (below 200 elements) and will be prepared once for a large number of searches. Array elements after the n-th can if necessary be initialized to something appropriate, if that speeds up the search.

No, binary search is not allowed, only linear search.

Edit: All my knowledge about this topic is now summarized in this blog post.

like image 768
Mark Probst Avatar asked Apr 30 '10 01:04

Mark Probst


People also ask

Is linear search fast?

Linear search is the same or slightly faster for arrays of less than about 100 integers, since it is simpler than a binary search. This ignores the cost of sorting the array, so the advantage could be slightly larger for real programs.

What is the time complexity of linear search?

Time Complexity In linear search, best-case complexity is O(1) where the element is found at the first index. Worst-case complexity is O(n) where the element is found at the last index or element is not present in the array. In binary search, best-case complexity is O(1) where the element is found at the middle index.

Is linear search ever faster than binary?

Binary search is faster than linear search except for small arrays. However, the array must be sorted first to be able to apply binary search. There are specialized data structures designed for fast searching, such as hash tables, that can be searched more efficiently than binary search.


1 Answers

So far you received multiple advice most of which state that linear search makes no sense on sorted data, when binary search will work much more efficiently instead. This often happens to be one of those popular "sounds right" assertions made by people who don't care to give the problem too much thought. In reality, if you consider the bigger picture, given the right circumstances, linear search can be much more efficient than binary search.

Note, that if we consider a single search query for a sorted array, binary search is significantly more efficient method than linear search. There's no argument about that. Also, when you perform multiple completely random queries to the same data binary search still wins over linear search.

However, the picture begins to change if we consider sequential search queries and these queries are not exactly random. Imagine that queries arrive in sorted order, i.e. each next query is for a higher value than the previous query. I.e. the queries are also sorted. BTW, they don't have to be globally and strictly sorted, from time to time the query sequence might get "reset", i.e. a low value is queried, but on average the consequent queries should arrive in increasing order. In other words, the queries arrive in series, each series sorted in ascending order. In this case, if the average length of the series is comparable to the length of your array, linear search will outperform binary search by a huge margin. However, to take advantage of this situation, you have to implement your search in incremental manner. It is simple: if the next query is greater than the previous one, you don't need to start the search from the beginning of the array. Instead, you can search from the point where the previous search stopped. The most simplistic implementation (just to illustrate the idea) might look as follows

static int linear(const int *arr, int n, int key) {   static int previous_key = INT_MIN;   static int previous_i = 0;    i = key >= previous_key ? previous_i : 0;    while (i < n) {     if (arr[i] >= key)       break;     ++i;   }    previous_key = key;   previous_i = i;    return i; } 

(Disclaimer: the above implementation is terribly ugly for the obvious reason that the array is arriving from outside as a parameter, while the previous search state is stored internally. Of course, this is wrong way to do it in practice. But again, the above is intended to illustrate the idea and no more).

Note, that the complexity of processing each series of ordered queries using the above approach is always O(N), regardless of the length of the series. Using the binary search, the complexity would be O(M * log N). So, for obvious reasons when M is close to N, i.e. queries arrive in sufficiently long ordered series, the above linear search will significantly outperform binary search, while for small M the binary search will win.

Also, even if the ordered series of queries are not very long, the above modification might still give you a noticeable improvement in search performance, considering that you have to use linear search.

P.S. As an additional piece of information about the structure of the problem:

When you need to perform the search in an ordered array of length N and you know in advance that the queries will arrive in ordered series of [approximate, average] length M, the optimal algorithm will look as follows

  1. Calculate the stride value S = [N/M]. It might also make sense to "snap" the value of S to the [nearest] power of 2. Think of your sorted array as a sequence of blocks of length S - so called S-blocks.
  2. After receiving a query, perform incremental linear search for the S-block that potentially contains the queried value, i.e. it is an ordinary linear search with stride S (of course, remember to start from the block where the previous search left off).
  3. After finding the S-block, perform the binary search within the S-block for the queried value.

The above is the most optimal incremental search algorithm possible, in a sense that it achieves the theoretical limit on the asymptotic efficiency of repetitive search. Note, that if the value of M is much smaller then N, the algorithm "automatically" shifts itself towards binary search, while when M gets close to N the algorithm "automatically" favors linear search. The latter makes sense because in such environment linear search is significantly more efficient than binary search.

This all is just to illustrate the fact that blanket statements like "linear search on a sorted array is always useless" indicate nothing else than lack of knowledge on the part of those who make such statements.

like image 125
AnT Avatar answered Sep 26 '22 03:09

AnT