Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to query number of distinct integers in a range in O(lg N)?

I have read through some tutorials about two common data structure which can achieve range update and query in O(lg N): Segment tree and Binary Indexed Tree (BIT / Fenwick Tree).

Most of the examples I have found is about some associative and commutative operation like "Sum of integers in a range", "XOR integers in a range", etc.

I wonder if these two data structures (or any other data structures / algorithm, please propose) can achieve the below query in O(lg N)? (If no, how about O(sqrt N))

Given an array of integer A, query the number of distinct integer in a range [l,r]

PS: Assuming the number of available integer is ~ 10^5, so used[color] = true or bitmask is not possible

For example: A = [1,2,3,2,4,3,1], query([2,5]) = 3, where the range index is 0-based.

like image 980
shole Avatar asked Sep 30 '16 08:09

shole


2 Answers

Yes, this is possible to do in O(log n), even if you should answer queries online. However, this requires some rather complex techniques.

First, let's solve the following problem: given an array, answer the queries of form "how many numbers <= x are there within indices [l, r]". This is done with a segment-tree-like structure which is sometimes called Merge Sort Tree. It is basically a segment tree where each node stores a sorted subarray. This structure requires O(n log n) memory (because there are log n layers and each of them requires storing n numbers). It is built in O(n log n) as well: you just go bottom-up and for each inner vertex merge sorted lists of its children.

Here is an example. Say 1 5 2 6 8 4 7 1 be an original array.

|1 1 2 4 5 6 7 8|
|1 2 5 6|1 4 7 8|
|1 5|2 6|4 8|1 7|
|1|5|2|6|8|4|7|1|

Now you can answer for those queries in O(log^2 n time): just make a reqular query to a segment tree (traversing O(log n) nodes) and make a binary search to know how many numbers <= x are there in that node (additional O(log n) from here).

This can be speed up to O(log n) using Fractional Cascading technique, which basically allows you to do the binary search not in each node but only in the root. However it is complex enough to be described in the post.

Now we return to the original problem. Assume you have an array a_1, ..., a_n. Build another array b_1, ..., b_n, where b_i = index of the next occurrence of a_i in the array, or ∞ if it is the last occurrence.

Example (1-indexed):

a = 1 3 1 2 2 1 4 1
b = 3 ∞ 6 5 ∞ 8 ∞ ∞

Now let's count numbers in [l, r]. For each unique number we'll count its last occurrence in the segment. With b_i notion you can see that the occurrence of the number is last if and only if b_i > r. So the problem boils down to "how many numbers > r are there in the segment [l, r]" which is trivially reduced to what I described above.

Hope it helps.

like image 178
Ivan Smirnov Avatar answered Nov 16 '22 12:11

Ivan Smirnov


If you're willing to answer queries offline, then plain old Segment Trees/ BIT can still help.

  • Sort queries based on r values.
  • Make a Segment Tree for range sum queries [0, n]
  • For each value in input array from left to right:

    1. Increment by 1 at current index i in the segment tree.
    2. For current element, if it's been seen before, decrement by 1 in
      segment tree at it's previous position.

    3. Answer queries ending at current index i, by querying for sum in range [l, r == i].

The idea in short is to keep marking rightward indexes, the latest occurrence of each individual element, and setting previous occurrences back to 0. The sum of range would give the count of unique elements.

Overall time complexity again would be nLogn.

like image 4
Gagandeep Kalra Avatar answered Nov 16 '22 13:11

Gagandeep Kalra