I have read through some tutorials about two common data structure which can achieve range update and query in O(lg N): Segment tree and Binary Indexed Tree (BIT / Fenwick Tree).
Most of the examples I have found is about some associative and commutative operation like "Sum of integers in a range", "XOR integers in a range", etc.
I wonder if these two data structures (or any other data structures / algorithm, please propose) can achieve the below query in O(lg N)? (If no, how about O(sqrt N))
Given an array of integer A, query the number of distinct integer in a range [l,r]
PS: Assuming the number of available integer is ~ 10^5, so used[color] = true
or bitmask is not possible
For example: A = [1,2,3,2,4,3,1], query([2,5]) = 3, where the range index is 0-based.
Yes, this is possible to do in O(log n), even if you should answer queries online. However, this requires some rather complex techniques.
First, let's solve the following problem: given an array, answer the queries of form "how many numbers <= x are there within indices [l, r]". This is done with a segment-tree-like structure which is sometimes called Merge Sort Tree. It is basically a segment tree where each node stores a sorted subarray. This structure requires O(n log n) memory (because there are log n layers and each of them requires storing n numbers). It is built in O(n log n) as well: you just go bottom-up and for each inner vertex merge sorted lists of its children.
Here is an example. Say 1 5 2 6 8 4 7 1
be an original array.
|1 1 2 4 5 6 7 8|
|1 2 5 6|1 4 7 8|
|1 5|2 6|4 8|1 7|
|1|5|2|6|8|4|7|1|
Now you can answer for those queries in O(log^2 n time): just make a reqular query to a segment tree (traversing O(log n) nodes) and make a binary search to know how many numbers <= x are there in that node (additional O(log n) from here).
This can be speed up to O(log n) using Fractional Cascading technique, which basically allows you to do the binary search not in each node but only in the root. However it is complex enough to be described in the post.
Now we return to the original problem. Assume you have an array a_1, ..., a_n. Build another array b_1, ..., b_n, where b_i = index of the next occurrence of a_i in the array, or ∞ if it is the last occurrence.
Example (1-indexed):
a = 1 3 1 2 2 1 4 1
b = 3 ∞ 6 5 ∞ 8 ∞ ∞
Now let's count numbers in [l, r]. For each unique number we'll count its last occurrence in the segment. With b_i notion you can see that the occurrence of the number is last if and only if b_i > r
. So the problem boils down to "how many numbers > r are there in the segment [l, r]" which is trivially reduced to what I described above.
Hope it helps.
If you're willing to answer queries offline, then plain old Segment Trees/ BIT can still help.
For each value in input array from left to right:
For current element, if it's been seen before, decrement by 1 in
segment tree at it's previous position.
Answer queries ending at current index i, by querying for sum in range [l, r == i].
The idea in short is to keep marking rightward indexes, the latest occurrence of each individual element, and setting previous occurrences back to 0. The sum of range would give the count of unique elements.
Overall time complexity again would be nLogn.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With