Algorithm to find 100 closest stars to the origin

Tags:

algorithm

First let me phrase the proper question:

Q: There is a file containing more than a million points (x,y) each of which represents a star. There is a planet earth at (a,b). Now, the task is to build an algorithm that would return the 100 closest stars to earth. What would be the time and space complexities of your algorithm.

This question has been asked many times in various interviews. I tried looking up the answers but could not find a satisfactory one.

One way to do it which I thought might be using a max heap of size 100. Calculate distances for each star and check if the distance is lesser than the root of the max heap. If yes, replace it with the root and call heapify.

Any other better/faster answers?

P.S: This is not a homework question.

943

asked Feb 08 '12 22:02

noMAD

2 Answers

You can actually do this in time O(n) and space O(k), where k is the number of closest points that you want, by using a very clever trick.

The selection problem is as follows: Given an array of elements and some index i, rearrange the elements of the array such that the ith element is in the right place, all elements smaller than the ith element are to the left, and all elements greater than the ith element are to the right. For example, given the array

40  10  00  30  20

If I tried to select based on index 2 (zero-indexed), one result might be

10  00  20  40  30

Since the element at index 2 (20) is in the right place, the elements to the left are smaller than 20, and the elements to the right are greater than 20.

It turns out that since this is a less strict requirement than actually sorting the array, it's possible to do this in time O(n), where n is the number of elements of the array. Doing so requires some complex algorithms like the median-of-medians algorithm, but is indeed O(n) time.

So how do you use this here? One option is to load all n elements from the file into an array, then use the selection algorithm to select the top k in O(n) time and O(n) space (here, k = 100).

But you can actually do better than this! For any constant k that you'd like, maintain a buffer of 2k elements. Load 2k elements from the file into the array, then use the selection algorithm to rearrange it so that the smallest k elements are in the left half of the array and the largest are in the right, then discard the largest k elements (they can't be any of the k closest points). Now, load k more elements from the file into the buffer and do this selection again, and repeat this until you've processed every line of the file. Each time you do a selection you discard the largest k elements in the buffer and retain the k closest points you've seen so far. Consequently, at the very end, you can select the top k elements one last time and find the top k.

What's the complexity of the new approach? Well, you're using O(k) memory for the buffer and the selection algorithm. You end up calling select on a buffer of size O(k) a total of O(n / k) times, since you call select after reading k new elements. Since select on a buffer of size O(k) takes time O(k), the total runtime here is O(n + k). If k = O(n) (a reasonable assumption), this takes time O(n), space O(k).

Hope this helps!

132

answered Oct 08 '22 14:10

templatetypedef

To elaborate on the MaxHeap solution you would build a max-heap with the first k elements from the file ( k = 100 in this case ).

The key for the max-heap would be its distance from Earth (a,b). Distance between 2 points on a 2d plane can be calculated using:

dist = (x1,y1) to (x2,y2) = square_root((x2 - x1)^2 + (y2 - y1)^2);

This would take O(k) time to construct. For every subsequent element from k to n. ie (n - k) elements you need to fetch its distance from earth and compare it with the top of max-heap. If the new element to be inserted is closer to earth than the top of the max-heap, replace the top of the max-heap and call heapify on the new root of the heap.

This would take O((n-k)logk) time to complete. Finally we would be left with just the k elements in the max-heap. You can call heapify k times to return all these k elements. This is another O(klogk).

Overall time complexity would be O(k + (n-k)logk + klogk).

answered Oct 08 '22 13:10

Abhi Tk

Related questions
                            
                                Iterative tree walking
                            
                                KDTree Splitting
                            
                                Finding connected components of adjacency matrix graph
                            
                                Finding taxicab Numbers
                            
                                Greatest distance between set of longitude/latitude points
                            
                                Split array into pieces of X length
                            
                                How to determine if two web pages are the same?
                            
                                Next month, same day in PHP
                            
                                Algorithm to find the smallest snippet from searching a document?
                            
                                Find the pair across 2 arrays with kth largest sum [closed]
                            
                                Largest rectangular sub matrix with the same number
                            
                                Efficient scheduling of university courses
                            
                                What is a Bw-tree?
                            
                                Word frequency in a large text file
                            
                                Is it possible to evaluate lambda calculus terms efficiently?
                            
                                How do I reverse a UTF-8 string in place?
                            
                                How can I check Hamming Weight without converting to binary?
                            
                                Combining MD5 hash values
                            
                                How does bubble sort compare to selection sort?
                            
                                Find word in dictionary of unknown size using only a method to get a word by index

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With