How fast can you make linear search?

Tags:

I'm looking to optimize this linear search:

static int linear (const int *arr, int n, int key) {         int i = 0;         while (i < n) {                 if (arr [i] >= key)                         break;                 ++i;         }         return i; }

The array is sorted and the function is supposed to return the index of the first element that is greater or equal to the key. They array is not large (below 200 elements) and will be prepared once for a large number of searches. Array elements after the n-th can if necessary be initialized to something appropriate, if that speeds up the search.

No, binary search is not allowed, only linear search.

Edit: All my knowledge about this topic is now summarized in this blog post.

768

asked Apr 30 '10 01:04

Mark Probst

1 Answers

So far you received multiple advice most of which state that linear search makes no sense on sorted data, when binary search will work much more efficiently instead. This often happens to be one of those popular "sounds right" assertions made by people who don't care to give the problem too much thought. In reality, if you consider the bigger picture, given the right circumstances, linear search can be much more efficient than binary search.

Note, that if we consider a single search query for a sorted array, binary search is significantly more efficient method than linear search. There's no argument about that. Also, when you perform multiple completely random queries to the same data binary search still wins over linear search.

However, the picture begins to change if we consider sequential search queries and these queries are not exactly random. Imagine that queries arrive in sorted order, i.e. each next query is for a higher value than the previous query. I.e. the queries are also sorted. BTW, they don't have to be globally and strictly sorted, from time to time the query sequence might get "reset", i.e. a low value is queried, but on average the consequent queries should arrive in increasing order. In other words, the queries arrive in series, each series sorted in ascending order. In this case, if the average length of the series is comparable to the length of your array, linear search will outperform binary search by a huge margin. However, to take advantage of this situation, you have to implement your search in incremental manner. It is simple: if the next query is greater than the previous one, you don't need to start the search from the beginning of the array. Instead, you can search from the point where the previous search stopped. The most simplistic implementation (just to illustrate the idea) might look as follows

static int linear(const int *arr, int n, int key) {   static int previous_key = INT_MIN;   static int previous_i = 0;    i = key >= previous_key ? previous_i : 0;    while (i < n) {     if (arr[i] >= key)       break;     ++i;   }    previous_key = key;   previous_i = i;    return i; }

(Disclaimer: the above implementation is terribly ugly for the obvious reason that the array is arriving from outside as a parameter, while the previous search state is stored internally. Of course, this is wrong way to do it in practice. But again, the above is intended to illustrate the idea and no more).

Note, that the complexity of processing each series of ordered queries using the above approach is always O(N), regardless of the length of the series. Using the binary search, the complexity would be O(M * log N). So, for obvious reasons when M is close to N, i.e. queries arrive in sufficiently long ordered series, the above linear search will significantly outperform binary search, while for small M the binary search will win.

Also, even if the ordered series of queries are not very long, the above modification might still give you a noticeable improvement in search performance, considering that you have to use linear search.

P.S. As an additional piece of information about the structure of the problem:

When you need to perform the search in an ordered array of length N and you know in advance that the queries will arrive in ordered series of [approximate, average] length M, the optimal algorithm will look as follows

Calculate the stride value S = [N/M]. It might also make sense to "snap" the value of S to the [nearest] power of 2. Think of your sorted array as a sequence of blocks of length S - so called S-blocks.
After receiving a query, perform incremental linear search for the S-block that potentially contains the queried value, i.e. it is an ordinary linear search with stride S (of course, remember to start from the block where the previous search left off).
After finding the S-block, perform the binary search within the S-block for the queried value.

The above is the most optimal incremental search algorithm possible, in a sense that it achieves the theoretical limit on the asymptotic efficiency of repetitive search. Note, that if the value of M is much smaller then N, the algorithm "automatically" shifts itself towards binary search, while when M gets close to N the algorithm "automatically" favors linear search. The latter makes sense because in such environment linear search is significantly more efficient than binary search.

This all is just to illustrate the fact that blanket statements like "linear search on a sorted array is always useless" indicate nothing else than lack of knowledge on the part of those who make such statements.

125

answered Sep 26 '22 03:09

AnT

Related questions
                            
                                Put name of parameters in C function prototypes?
                            
                                Check if socket is connected or not [duplicate]
                            
                                "Static const" vs "#define" for efficiency in C
                            
                                Do parentheses make a difference when determining the size of an array?
                            
                                How do I best handle dynamic multi-dimensional arrays in C/C++?
                            
                                understanding C namespaces
                            
                                What's the major difference between "union" and "struct" in C.? [duplicate]
                            
                                Is malloc deterministic?
                            
                                Does C have an equivalent of std::less from C++?
                            
                                Regex to pull out C function prototype declarations?
                            
                                Window C/C++ Crypto API Examples and tips
                            
                                How to check if the program is run from a console?
                            
                                How to convert int to float in C?
                            
                                How to get the absolute path for a given relative path programmatically in Linux?
                            
                                Difference between "return 0" and "exit (0)" [duplicate]
                            
                                How to move C functions into separate file?
                            
                                Invalid read of size 8 - Valgrind + C
                            
                                Can I kill a process from itself?
                            
                                Implementation of __builtin_clz
                            
                                How to use C source files in a C++ project?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How fast can you make linear search?

Tags:

c

optimization

search

simd

linear-search

Mark Probst

People also ask

1 Answers

AnT

Recent Activity

Donate For Us