I was recently given this interview question and I'm curious what a good solution to it would be. <blockquote> Say I'm given a 2d array where all the numbers in the array are in increasing order from left to right and top to bottom. What is the best way to search and determine if a target number is in the array? </blockquote> Now, my first inclination is to utilize a binary search since my data is sorted. I can determine if a number is in a single row in O(log N) time. However, it is the 2 directions that throw me off. Another solution I thought may work is to start somewhere in the middle. If the middle value is less than my target, then I can be sure it is in the left square portion of the matrix from the middle. I then move diagonally and check again, reducing the size of the square that the target could potentially be in until I have honed in on the target number. Does anyone have any good ideas on solving this problem? Example array: Sorted left to right, top to bottom. <pre class="prettyprint"><code>1 2 4 5 6 2 3 5 7 8 4 6 8 9 10 5 8 9 10 11 </code></pre>

Here's a simple approach: <ol> <li>Start at the bottom-left corner. </li> <li>If the target is less than that value, it must be above us, so move up one.</li> <li>Otherwise we know that the target can't be in that column, so move right one.</li> <li>Goto 2.</li> </ol> For an <code>NxM</code> array, this runs in <code>O(N+M)</code>. I think it would be difficult to do better. :) <hr> Edit: Lots of good discussion. I was talking about the general case above; clearly, if <code>N</code> or <code>M</code> are small, you could use a binary search approach to do this in something approaching logarithmic time. Here are some details, for those who are curious: <h3>History</h3> This simple algorithm is called a Saddleback Search. It's been around for a while, and it is optimal when <code>N == M</code>. Some references: <ul> <li>David Gries, The Science of Programming. Springer-Verlag, 1989.</li> <li>Edsgar Dijkstra, The Saddleback Search. Note EWD-934, 1985.</li> </ul> However, when <code>N < M</code>, intuition suggests that binary search should be able to do better than <code>O(N+M)</code>: For example, when <code>N == 1</code>, a pure binary search will run in logarithmic rather than linear time. <h3>Worst-case bound</h3> Richard Bird examined this intuition that binary search could improve the Saddleback algorithm in a 2006 paper: <ul> <li>Richard S. Bird, Improving Saddleback Search: A Lesson in Algorithm Design, in Mathematics of Program Construction, pp. 82--89, volume 4014, 2006.</li> </ul> Using a rather unusual conversational technique, Bird shows us that for <code>N <= M</code>, this problem has a lower bound of <code>Ω(N * log(M/N))</code>. This bound make sense, as it gives us linear performance when <code>N == M</code> and logarithmic performance when <code>N == 1</code>. <h3>Algorithms for rectangular arrays</h3> One approach that uses a row-by-row binary search looks like this: <ol> <li>Start with a rectangular array where <code>N < M</code>. Let's say <code>N</code> is rows and <code>M</code> is columns.</li> <li>Do a binary search on the middle row for <code>value</code>. If we find it, we're done.</li> <li>Otherwise we've found an adjacent pair of numbers <code>s</code> and <code>g</code>, where <code>s < value < g</code>.</li> <li>The rectangle of numbers above and to the left of <code>s</code> is less than <code>value</code>, so we can eliminate it.</li> <li>The rectangle below and to the right of <code>g</code> is greater than <code>value</code>, so we can eliminate it.</li> <li>Go to step (2) for each of the two remaining rectangles.</li> </ol> In terms of worst-case complexity, this algorithm does <code>log(M)</code> work to eliminate half the possible solutions, and then recursively calls itself twice on two smaller problems. We do have to repeat a smaller version of that <code>log(M)</code> work for every row, but if the number of rows is small compared to the number of columns, then being able to eliminate all of those columns in logarithmic time starts to become worthwhile. This gives the algorithm a complexity of <code>T(N,M) = log(M) + 2 * T(M/2, N/2)</code>, which Bird shows to be <code>O(N * log(M/N))</code>. Another approach posted by Craig Gidney describes an algorithm similar the approach above: it examines a row at a time using a step size of <code>M/N</code>. His analysis shows that this results in <code>O(N * log(M/N))</code> performance as well. <h3>Performance Comparison</h3> Big-O analysis is all well and good, but how well do these approaches work in practice? The chart below examines four algorithms for increasingly "square" arrays: <img src="https://i.stack.imgur.com/SZwvl.png" alt="algorithm performance vs squareness"> (The "naive" algorithm simply searches every element of the array. The "recursive" algorithm is described above. The "hybrid" algorithm is an implementation of Gidney's algorithm. For each array size, performance was measured by timing each algorithm over fixed set of 1,000,000 randomly-generated arrays.) Some notable points: <ul> <li>As expected, the "binary search" algorithms offer the best performance on rectangular arrays and the Saddleback algorithm works the best on square arrays.</li> <li>The Saddleback algorithm performs worse than the "naive" algorithm for 1-d arrays, presumably because it does multiple comparisons on each item.</li> <li>The performance hit that the "binary search" algorithms take on square arrays is presumably due to the overhead of running repeated binary searches.</li> </ul> <h3>Summary</h3> Clever use of binary search can provide <code>O(N * log(M/N)</code> performance for both rectangular and square arrays. The <code>O(N + M)</code> "saddleback" algorithm is much simpler, but suffers from performance degradation as arrays become increasingly rectangular.

How do I search for a number in a 2d array sorted left to right and top to bottom?

Tags:

algorithm

multidimensional-array

search

I was recently given this interview question and I'm curious what a good solution to it would be.

Say I'm given a 2d array where all the numbers in the array are in increasing order from left to right and top to bottom.

What is the best way to search and determine if a target number is in the array?

Now, my first inclination is to utilize a binary search since my data is sorted. I can determine if a number is in a single row in O(log N) time. However, it is the 2 directions that throw me off.

Another solution I thought may work is to start somewhere in the middle. If the middle value is less than my target, then I can be sure it is in the left square portion of the matrix from the middle. I then move diagonally and check again, reducing the size of the square that the target could potentially be in until I have honed in on the target number.

Does anyone have any good ideas on solving this problem?

Example array:

Sorted left to right, top to bottom.

1  2  4  5  6   2  3  5  7  8   4  6  8  9  10   5  8  9  10 11

779

asked Mar 16 '10 20:03

Phukab

1 Answers

Here's a simple approach:

Start at the bottom-left corner.
If the target is less than that value, it must be above us, so move up one.
Otherwise we know that the target can't be in that column, so move right one.
Goto 2.

For an NxM array, this runs in O(N+M). I think it would be difficult to do better. :)

Edit: Lots of good discussion. I was talking about the general case above; clearly, if N or M are small, you could use a binary search approach to do this in something approaching logarithmic time.

Here are some details, for those who are curious:

History

This simple algorithm is called a Saddleback Search. It's been around for a while, and it is optimal when N == M. Some references:

David Gries, The Science of Programming. Springer-Verlag, 1989.
Edsgar Dijkstra, The Saddleback Search. Note EWD-934, 1985.

However, when N < M, intuition suggests that binary search should be able to do better than O(N+M): For example, when N == 1, a pure binary search will run in logarithmic rather than linear time.

Worst-case bound

Richard Bird examined this intuition that binary search could improve the Saddleback algorithm in a 2006 paper:

Richard S. Bird, Improving Saddleback Search: A Lesson in Algorithm Design, in Mathematics of Program Construction, pp. 82--89, volume 4014, 2006.

Using a rather unusual conversational technique, Bird shows us that for N <= M, this problem has a lower bound of Ω(N * log(M/N)). This bound make sense, as it gives us linear performance when N == M and logarithmic performance when N == 1.

Algorithms for rectangular arrays

One approach that uses a row-by-row binary search looks like this:

Start with a rectangular array where N < M. Let's say N is rows and M is columns.
Do a binary search on the middle row for value. If we find it, we're done.
Otherwise we've found an adjacent pair of numbers s and g, where s < value < g.
The rectangle of numbers above and to the left of s is less than value, so we can eliminate it.
The rectangle below and to the right of g is greater than value, so we can eliminate it.
Go to step (2) for each of the two remaining rectangles.

In terms of worst-case complexity, this algorithm does log(M) work to eliminate half the possible solutions, and then recursively calls itself twice on two smaller problems. We do have to repeat a smaller version of that log(M) work for every row, but if the number of rows is small compared to the number of columns, then being able to eliminate all of those columns in logarithmic time starts to become worthwhile.

This gives the algorithm a complexity of T(N,M) = log(M) + 2 * T(M/2, N/2), which Bird shows to be O(N * log(M/N)).

Another approach posted by Craig Gidney describes an algorithm similar the approach above: it examines a row at a time using a step size of M/N. His analysis shows that this results in O(N * log(M/N)) performance as well.

Performance Comparison

Big-O analysis is all well and good, but how well do these approaches work in practice? The chart below examines four algorithms for increasingly "square" arrays:

algorithm performance vs squareness

(The "naive" algorithm simply searches every element of the array. The "recursive" algorithm is described above. The "hybrid" algorithm is an implementation of Gidney's algorithm. For each array size, performance was measured by timing each algorithm over fixed set of 1,000,000 randomly-generated arrays.)

Some notable points:

As expected, the "binary search" algorithms offer the best performance on rectangular arrays and the Saddleback algorithm works the best on square arrays.
The Saddleback algorithm performs worse than the "naive" algorithm for 1-d arrays, presumably because it does multiple comparisons on each item.
The performance hit that the "binary search" algorithms take on square arrays is presumably due to the overhead of running repeated binary searches.

Summary

Clever use of binary search can provide O(N * log(M/N) performance for both rectangular and square arrays. The O(N + M) "saddleback" algorithm is much simpler, but suffers from performance degradation as arrays become increasingly rectangular.

145

answered Sep 24 '22 12:09

Nate Kohl

Related questions
                            
                                Find the Smallest Integer Not in a List
                            
                                How can I find the shortest path between 100 moving targets? (Live demo included.)
                            
                                How can Google be so fast?
                            
                                What is O(log* N)?
                            
                                How do I check if a directed graph is acyclic?
                            
                                What is amortized analysis of algorithms? [closed]
                            
                                Efficient way to search an element
                            
                                JavaScript: Calculate the nth root of a number
                            
                                Quick and Simple Hash Code Combinations
                            
                                Algorithm to check similarity of colors
                            
                                Fast prime factorization module
                            
                                Inverting a 4x4 matrix
                            
                                The Most Efficient Way To Find Top K Frequent Words In A Big Word Sequence
                            
                                Easiest algorithm of Voronoi diagram to implement? [closed]
                            
                                How do you like your primary keys? [closed]
                            
                                Find the shortest path in a graph which visits certain nodes
                            
                                Undo/Redo implementation
                            
                                algorithm used to calculate 5 star ratings
                            
                                Searching in a sorted and rotated array
                            
                                How to implement tag system

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With