Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting according to query

Given an array of N positive elements. Lets suppose we list all N × (N+1) / 2 non-empty continuous subarrays of the array A and then replaced all the subarrays with the maximum element present in the respective subarray. So now we have N × (N+1) / 2 elements where each element is maximum among its subarray.

Now we are having Q queries, where each query is one of 3 types :

1 K : We need to count of numbers strictly greater than K among those N × (N+1) / 2 elements.

2 K : We need to count of numbers strictly less than K among those N × (N+1) / 2 elements.

3 K : We need to count of numbers equal to K among those N × (N+1) / 2 elements.

Now main problem am facing is N can be upto 10^6. So i can't generate all those N × (N+1) / 2 elements. Please help to solve this porblem.

Example : Let N=3 and we have Q=2. Let array A be [1,2,3] then all sub arrays are :

[1] -> [1]
[2] -> [2]
[3] -> [3]
[1,2] -> [2]
[2,3] -> [3]
[1,2,3] -> [3]

So now we have [1,2,3,2,3,3]. As Q=2 so :

Query 1 : 3 3

It means we need to tell count of numbers equal to 3. So answer is 3 as there are 3 numbers equal to 3 in the generated array.

Query 2 : 1 4

It means we need to tell count of numbers greater than 4. So answer is 0 as no one is greater than 4 in generated array.

Now both N and Q can be up to 10^6. So how to solve this problem. Which data structure should be suitable to solve it.

like image 495
mat7 Avatar asked Aug 08 '15 07:08

mat7


People also ask

How do you do a COUNT in a query?

On the Design tab, in the Show/Hide group, click Totals. The Total row appears in the design grid and Group By appears in the row for each field in the query. In the Total row, click the field that you want to count and select Count from the resulting list.

How do you write a COUNT in SQL query?

Syntax of Select Count Function in SQL In the syntax, we have to specify the column's name after the COUNT keyword and the name of the table on which the Count function is to be executed.


2 Answers

I believe I have a solution in O(N + Q*log N) (More about time complexity). The trick is to do a lot of preparation with your array before even the first query arrives.

  1. For each number, figure out where is the first number on left / right of this number that is strictly bigger.

Example: for array: 1, 8, 2, 3, 3, 5, 1 both 3's left block would be position of 8, right block would be the position of 5.

This can be determined in linear time. This is how: Keep a stack of previous maximums in a stack. If a new maximum appears, remove maximums from the stack until you get to a element bigger than or equal to the current one. Illustration:

Illustration

In this example, in the stack is: [15, 13, 11, 10, 7, 3] (you will of course keep the indexes, not the values, I will just use value for better readability).

Now we read 8, 8 >= 3 so we remove 3 from stack and repeat. 8 >= 7, remove 7. 8 < 10, so we stop removing. We set 10 as 8's left block, and add 8 to the maximums stack.

Also, whenever you remove from the stack (3 and 7 in this example), set the right block of removed number to the current number. One problem though: right block would be set to the next number bigger or equal, not strictly bigger. You can fix this with simply checking and relinking right blocks.

  1. Compute what number is how many times a maximum of some subsequence.

Since for each number you now know where is the next left / right bigger number, I trust you with finding appropriate math formula for this.

Then, store the results in a hashmap, key would be a value of a number, and value would be how many times is that number a maximum of some subsequence. For example, record [4->12] would mean that number 4 is the maximum in 12 subsequences.

Lastly, extract all key-value pairs from the hashmap into an array, and sort that array by the keys. Finally, create a prefix sum for the values of that sorted array.

  1. Handle a request

For request "exactly k", just binary search in your array, for more/less thank``, binary search for key k and then use the prefix array.

like image 124
kajacx Avatar answered Oct 01 '22 02:10

kajacx


This answer is an adaptation of this other answer I wrote earlier. The first part is exactly the same, but the others are specific for this question.

Here's an implemented a O(n log n + q log n) version using a simplified version of a segment tree.

Creating the segment tree: O(n)

In practice, what it does is to take an array, let's say:

A = [5,1,7,2,3,7,3,1]

And construct an array-backed tree that looks like this:

segment tree

In the tree, the first number is the value and the second is the index where it appears in the array. Each node is the maximum of its two children. This tree is backed by an array (pretty much like a heap tree) where the children of the index i are in the indexes i*2+1 and i*2+2.

Then, for each element, it becomes easy to find the nearest greater elements (before and after each element).

To find the nearest greater element to the left, we go up in the tree searching for the first parent where the left node has value greater and the index lesser than the argument. The answer must be a child of this parent, then we go down in the tree looking for the rightmost node that satisfies the same condition.

Similarly, to find the nearest greater element to the right, we do the same, but looking for a right node with an index greater than the argument. And when going down, we look for the leftmost node that satisfies the condition.

Creating the cumulative frequency array: O(n log n)

From this structure, we can compute the frequency array, that tells how many times each element appears as maximum in the subarray list. We just have to count how many lesser elements are on the left and on the right of each element and multiply those values. For the example array ([1, 2, 3]), this would be:

[(1, 1), (2, 2), (3, 3)]

This means that 1 appears only once as maximum, 2 appears twice, etc.

But we need to answer range queries, so it's better to have a cumulative version of this array, that would look like:

[(1, 1), (2, 3), (3, 6)]

The (3, 6) means, for example, that there are 6 subarrays with maxima less than or equal to 3.

Answering q queries: O(q log n)

Then, to answer each query, you just have to make binary searches to find the value you want. For example. If you need to find the exact number of 3, you may want to do: query(F, 3) - query(F, 2). If you want to find those lesser than 3, you do: query(F, 2). If you want to find those greater than 3: query(F, float('inf')) - query(F, 3).

Implementation

I've implemented it in Python and it seems to work well.

import sys, random, bisect
from collections import defaultdict
from math import log, ceil

def make_tree(A):
    n = 2**(int(ceil(log(len(A), 2))))
    T = [(None, None)]*(2*n-1)

    for i, x in enumerate(A):
        T[n-1+i] = (x, i)

    for i in reversed(xrange(n-1)):
        T[i] = max(T[i*2+1], T[i*2+2])

    return T

def print_tree(T):
    print 'digraph {'
    for i, x in enumerate(T):
        print '    ' + str(i) + '[label="' + str(x) + '"]'
        if i*2+2 < len(T):
            print '    ' + str(i)+ '->'+ str(i*2+1)
            print '    ' + str(i)+ '->'+ str(i*2+2)

    print '}'

def find_generic(T, i, fallback, check, first, second):
    j = len(T)/2+i
    original = T[j]
    j = (j-1)/2

    #go up in the tree searching for a value that satisfies check
    while j > 0 and not check(T[second(j)], original):
        j = (j-1)/2

    #go down in the tree searching for the left/rightmost node that satisfies check
    while j*2+1<len(T):
        if check(T[first(j)], original):
            j = first(j)
        elif check(T[second(j)], original):
            j = second(j)
        else:
            return fallback

    return j-len(T)/2


def find_left(T, i, fallback):
    return find_generic(T, i, fallback, 
        lambda a, b: a[0]>b[0] and a[1]<b[1],  #value greater, index before
        lambda j: j*2+2,                       #rightmost first
        lambda j: j*2+1                        #leftmost second
    ) 


def find_right(T, i, fallback):
    return find_generic(T, i, fallback,
        lambda a, b: a[0]>=b[0] and a[1]>b[1], #value greater or equal, index after
        lambda j: j*2+1,                       #leftmost first
        lambda j: j*2+2                        #rightmost second
    )       

def make_frequency_array(A):
    T = make_tree(A)

    D = defaultdict(lambda: 0)
    for i, x in enumerate(A):
        left = find_left(T, i, -1)
        right = find_right(T, i, len(A))
        D[x] += (i-left) * (right-i)

    F = sorted(D.items())
    for i in range(1, len(F)):
        F[i] = (F[i][0], F[i-1][1] + F[i][1])

    return F

def query(F, n):
    idx = bisect.bisect(F, (n,))
    if idx>=len(F): return F[-1][1]
    if F[idx][0]!=n: return 0
    return F[idx][1]

F = make_frequency_array([1,2,3])
print query(F, 3)-query(F, 2) #3 3
print query(F, float('inf'))-query(F, 4) #1 4
print query(F, float('inf'))-query(F, 1) #1 1
print query(F, 2) #2 3
like image 41
Juan Lopes Avatar answered Oct 01 '22 02:10

Juan Lopes