Given an unordered list of n nonnegative integers (with no guarantees about distribution or repetition), I want to be able to be given an integer possibly not in the list, and respond with the number of integers at least that large that are in the list. I have up to n^2 preprocessing time, and up to n*log(n) storage. Is this possible?
My not-good-enough solution is binary search (log n time, constant space).
Storing a map of all possible queries in a map would take up too much storage.
Edit: Partial solutions which require some assumption on the inputs, such as the distribution or max size of an integer, are also useful.
Edit: This is known as the predecessor/successor problem. There's a paper by Beame & Fich in which they construct a data structure that stores n element sets of integers from a universe of size N in O(n) space and performs predecessor queries in O(min{(log log N) / (log log log N), sqrt(log n / (log log n))}) time.
http://homes.cs.washington.edu/~beame/papers/stocpred.pdf
Edit - Bounty: None of the answers as of this morning are exactly what I'm looking for. N is not bounded. Integers are not necessarily under 32 bits. The largest element may be much larger than the number of elements. I assume no distribution on the inputs. Of the existing answers, I accepted Coffin's for the bounty because it covers the relatively large subset of problems where I do have the distribution.
Assuming your elements are reasonably evenly distributed (or at least follow some distribution fairly closely) the obvious method would be to sort the data, then use an interpolating search instead of a binary search.
An interpolating search typically has roughly O(log log n) complexity.
Oh, in case it's not obvious from the name, the basic idea of an interpolating search is to use interpolation to guess at the approximate location of the element you're searching for. If, for example, you're dealing with integers between 0 and 100,000, and you need to find, say, 21,000, you can start at a location about 21% of the way into the array instead of starting from the halfway point. Then, based on the value you find there, you can use interpolation to find a better guess, and so on.
That example is for linear interpolation, but the same basic idea applies to other distributions just as well--you just have to use a function that fits (reasonably well) with the data's distribution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With