Is there a data structure representing a large set S
of (64-bit) integers, that starts out empty and supports the following two operations:
insert(s)
inserts the number s
into S
;minmod(m)
returns the number s
in S
such that s mod m
is minimal.An example:
insert(11) insert(15) minmod(7) -> the answer is 15 (which mod 7 = 1) insert(14) minmod(7) -> the answer is 14 (which mod 7 = 0) minmod(10) -> the answer is 11 (which mod 10 = 1)
I am interested in minimizing the maximal total time spent on a sequence of n
such operations. It is obviously possible to just maintain a list of elements for S
and iterate through them for every minmod
operation; then insert is O(1)
and minmod is O(|S|)
, which would take O(n^2)
time for n
operations (e.g., n/2
insert
operations followed by n/2
minmod
operations would take roughly n^2/4
operations).
So: is it possible to do better than O(n^2)
for a sequence of n
operations? Maybe O(n sqrt(n))
or O(n log(n))
? If this is possible, then I would also be interested to know if there are data structures that additionally admit removing single elements from S
, or removing all numbers within an interval.
Another idea based on balanced binary search tree, as in Keith's answer.
Suppose all inserted elements so far are stored in balanced BST, and we need to compute minmod(m)
. Consider our set S
as a union of subsets of numbers, lying in intervals [0,m-1], [m, 2m-1], [2m, 3m-1] .. etc. The answer will obviously be among the minimal numbers we have in each of that intervals. So, we can consequently lookup the tree to find the minimal numbers of that intervals. It's easy to do, for example if we need to find the minimal number in [a,b], we'll move left if current value is greater than a, and right otherwise, keeping track of the minimal value in [a,b] we've met so far.
Now if we suppose that m is uniformly distributed in [1, 2^64], let's calculate the mathematical expectation of number of queries we'll need.
For all m in [2^63, 2^64-1] we'll need 2 queries. The probability of this is 1/2.
For all m in [2^62, 2^63-1] we'll need 4 queries. The probability of this is 1/4.
...
The mathematical expectation will be sum[ 1/(2^k) * 2^k ], for k in [1,64], which is 64 queries.
So, to sum up, the average minmod(m)
query complexity will be O(64*logn). In general, if we m has unknown upper bound, this will be O(logmlogn). The BST update is, as known, O(logn), so the overall complexity in case of n queries will be O(nlogm*logn).
Partial answer too big for a comment.
Suppose you implement S
as a balanced binary search tree.
When you seek S.minmod(m)
, naively you walk the tree and the cost is O(n^2).
However, at a given time during the walk, you have the best (lowest) result so far. You can use this to avoid checking whole sub-trees when:
bestSoFar < leftChild mod m
and
rightChild - leftChild < m - leftChild mod m
This will only help much if a common spacing b/w the numbers in the set is smaller than common values of m
.
Update the next morning...
Grigor has better and more fully articulated my idea and shown how it works well for "large" m
. He also shows how a "random" m
is typically "large", so works well.
Grigor's algorithm is so efficient for large m
that one needs to think about the risk for much smaller m
.
So it is clear that you need to think about the distribution of m
and optimise for different cases if need be.
For example, it might be worth simply keeping track of the minimal modulus for very small m
.
But suppose m ~ 2^32
? Then the search algorithm (certainly as given but also otherwise) needs to check 2^32
intervals, which may amount to searching the whole set anyway.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With