Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CSES Range Query Question: Salary Queries

I'm trying to solve this problem: https://cses.fi/problemset/task/1144/

Given an array of up to 200000 elements, my task is to process up to 200000 queries, which either ask me to update a single value within the array or ask me to find the number of elements between a and b that lie in a given range (for example, a query would ask how many elements from indices 1 to 5 are in the range [2, 3]).

My current idea is to first use index compression on the values in the given array (since the values can be up to 10^9, so keeping a simple occurrence array would exceed storage limits), then keep another array that contains the number of occurrences of each compressed number. Then, processing and updating queries could be done using a sum segment tree.

However, I ran into a problem while trying to implement this approach. I realized that updating a single array value would force me to change the compressed array.

For example, given an array [1, 5, 3, 3, 2], I would define a compression function C such that

C[1] = 0;
C[2] = 1;
C[3] = 2;
C[5] = 3;

Then, the occurrence array would be [1, 1, 2, 1], and processing sum queries would be efficient. However, if I were instructed to update a value, say, change the third element to 4, then that throws everything out of balance. The compression function would have to change to

C[1] = 0;
C[2] = 1;
C[3] = 2;
C[4] = 3;
C[5] = 4;

which would force me to reconstruct my occurrence array, resulting in O(N) update time.

Since N can be up to 200000, my current approach will not work efficiently enough to solve the problem, although I think I have the right idea with index compression. Can somebody please point me in the right direction with my method?

like image 664
Anthony Smith Avatar asked Aug 17 '20 18:08

Anthony Smith


3 Answers

You have the right idea in using index compression - great thinking! As N is only up to 200000, keeping an occurrence array will at most require 200000 elements for the initial values of the given array, instead of 10^9 array indices.

According to yourself, the problem you face is when you encounter new values during processing queries. You're right; this would throw the occurrence array off-balance and cause updates to have to run in O(N) time. The solution to this problem is just a tiny modification to your current method.

To solve the problem of encountering new values, we can just make sure that we'll never encounter any new values. We can do this by reading in all the queries before constructing the sum segment tree. This will result in a maximum of N + 2*Q unique values, or 600000 in the worst case, which is far enough to construct an occurrence array with the problem's 512MB storage limit. After that, a sum segment tree will be able to answer these queries efficiently.

So in the end, a strategy to solve this problem would be to input every unique number, then construct an index compression function, then use a sum segment tree to efficiently process sum queries.

In the future, remember that in these sorts of query-answering questions, it might be useful to read in ALL the input before precomputation. Good luck with your program.

like image 52
Telescope Avatar answered Sep 23 '22 17:09

Telescope


First, consider the naive: For each update, update the array. For each query, scan through the entire array and collect your answer. The complexity of this solution has O(n) updates, O(n) queries. No good.

We can come up with a different solution with arguably worse time complexity, but it gives us a hint as to what our end result is. Maintain the source array at all times, but also keep a hash map of value->frequency. Then, when you update, decrement the frequency at the old value and increment it at the new value. Now, for queries, loop through all values of that query range and sum them for your answer. This results in O(1) updates and O(r-l) queries, so we have excellent updates but awful queries. However, this result can be improved upon if we can just speed up those queries! Enter the Segment Tree.

Traditionally, you'd construct a segment tree all the way down to its leaves upon creation. However, we'd nominally like a segment tree that ranges from 0-10^9, so there's absolutely no way that we could generate that much memory (and we'd time out in doing so). However, what if we create a segment tree, but for every node, its children are implicit if they've never been used. That is, don't create child nodes if there aren't elements in them. This structure is named, aptly, the Implicit Segment Tree. The idea here is implement your segment tree as normal except skip the part in the constructor where you initialize your left and right children. Now, when you need to delve into your children due to a partial range query, check if they exist, and if they don't, create them. Otherwise, since you've never needed to make them, assume the sum of the values in those nodes is 0!

The final solution is as follows: Create a segment tree of the max value queryable (if you don't have to answer interactively, consider saving and scanning your queries to find the max r-value, but you don't have to). Note to make this an implicit segment tree. Maintain the source array after every update, and also do point-updates on your tree which will be O(log(max value)). Queries are regular segment tree range queries, so these will be O(log(max value)). And there it is!

like image 28
Jacob Steinebronn Avatar answered Sep 22 '22 17:09

Jacob Steinebronn


You can use policy based data structure, which has some useful methods such as order_of_key() - which returns number of items less than given num. We can call this twice like getcnt(b+1) - getcnt(a) - which gives the count of items between the given range. For more info on this - you can refer - https://codeforces.com/blog/entry/11080 and also https://gcc.gnu.org/onlinedocs/libstdc++/manual/policy_data_structures.html

After a lot of research I found that this STL is very useful while using tree based structures.

I tested the below code and it passes all the test cases.

#include<bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp> // Common file
#include <ext/pb_ds/tree_policy.hpp> // Including tree_order_statistics_node_update
 
using namespace std;
using namespace __gnu_pbds;
 
template<class T> using cust_set = tree<
T,
null_type,
less<T>,
rb_tree_tag,
tree_order_statistics_node_update>;
cust_set<array<int,2>> freq;
 
int getcnt(int x)
{
    return freq.order_of_key({x,0});
}
int main()
{
    int n,q;
    scanf("%d%d",&n,&q);
    vector<int> emp(n);
    
    int sal;
    for(int i=0;i<n;i++)
    {
        cin >> emp[i];
        freq.insert({emp[i],i});
    }
    char c;
    int x,a,b;
    while(q--)
    {
        cin>> c;
        int ans=0;
        if(c=='?')
        {
            cin>>a>>b;
            cout << getcnt(b+1) - getcnt(a)<<"\n";
        }
        else
        {
            cin>>a>>b;
            --a;
            freq.erase({emp[a],a});
            emp[a] = b;
            freq.insert({emp[a],a});
        }
    }
    return 0;
}
like image 23
rootkonda Avatar answered Sep 23 '22 17:09

rootkonda