Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I find the k-th largest element in an exponentially large list?

Suppose there are n sets of real numbers: S[1], S[2], ..., S[n]. We know two things about these sets:

  1. Each set S[i] has exactly 3 elements.

  2. All elements in each of the sets S[i] are real numbers in the [0, 1] range. (I don't know if this detail can be helpful for the solution, though).

Let's consider a set T of all numbers that can be represented as p[1] * p[2] * p[3] * ... * p[n] where p[i] is an element of S[i]. This set T, obviously, has 3^n elements.

My question is, given the sets S[1], S[2], ..., S[n] (1 <= n <= 30) and some 1 <= k <= 10 as input, can we find the k-th largest number in T faster than in O(3^n) time? It's important that I need not only the k-th largest number, but also the corresponding numbers (p[1], p[2], p[3], ... , p[n]) that produce it.

Even if the answer is no, I would appreciate any hints on how you would solve this problem approximately, maybe, by using some heuristics? I know about beam search, but maybe you could suggest something else? And even for beam search, it is not really clear how to implement it here the best way.

If the exact answer can be obtained algorithmically in less than O(3^n) time, I would greatly appreciate it if you could point out the solution.

like image 967
Swistack Avatar asked Sep 02 '21 17:09

Swistack


People also ask

How do you find the largest KTH element?

To find the kth largest element, we can pass k= length(Array) – k. Now let's implement the partition method, which picks the rightmost element as a pivot, puts it at the appropriate index, and partitions the array in such a way that elements at lower indexes should be less than the pivot element.

What is k th largest element in an array?

It can be clearly observed that Kth largest element is the same as (N – K)th smallest element, where N is the size of the given array. Therefore, we can apply the Kth smallest approach to this problem. Firstly, a pivot element must be chosen, similar to what we do in quicksort.

What will the run time to find the kth largest element in a sorted array of size n be provided there are no duplicates in the array?

Statistically speaking, the time it takes to find the kth element grows with n, O(n).

How do you find the smallest k element?

K'th smallest element in an unsorted array using sorting: Sort the given array and return the element at index K-1 in the sorted array. Follow the given steps to solve the problem: Sort the input array in the increasing order. Return the element at the K-1 index (0 – Based indexing) in the sorted array.


2 Answers

Well, you know that the largest product is the one that uses the largest factor from each set.

Furthermore, every other product can be formed by starting with a larger one, and then decreasing the factor chosen in exactly one set.

That leads to a simple search:

  1. Put the largest product in a max-first priority queue.

  2. Repeat k times:

    a. Remove the largest product p from the priority queue

    b. For each set that has a smaller number than the one selected in p, generate the product formed by decreasing that number to the next lower one in that set. If this selection of factors hasn't been seen before, then add it to the priority queue.

Products will be removed from the queue in decreasing order, so the kth one you take out is the kth largest.

Complexity is about N*(k log kN), depending on how you implement things.

Note that there may be multiple ways to select the factors that produce the same product. This solution considers those ways to be distinct products, i.e., each way is counted when finding the kth largest. That may or may not be what you want.

like image 185
Matt Timmermans Avatar answered Nov 14 '22 22:11

Matt Timmermans


To put the previous discussion into code we can do the following:

import operator
from functools import partial, reduce
import heapq

def prod_by_data(tup, data):
    return reduce(operator.mul, (datum[t] for t, datum in zip(tup, data)), 1)

def downset(tup):
    return [
        tuple(t - (1 if j == i else 0) for j, t in enumerate(tup))
        for i in range(len(tup))
        if tup[i] > 0
    ]

data = [
    [1, 2, 3],
    [4, 2, 1],
    [8, 1, 3],
    [1, 1, 2],
]

data = [sorted(d) for d in data]
prod = partial(prod_by_data, data=data)

k_smallest = [tuple(len(dat) - 1 for dat in data)]
possible_k_smallest = []

while len(k_smallest) < 10:
    new_possible = sorted(downset(k_smallest[-1]), key=prod, reverse=True)
    possible_k_smallest = heapq.merge(possible_k_smallest, new_possible, key=prod, reverse=True)
    k_smallest.append(next(possible_k_smallest))

print(k_smallest)
print([prod(tup) for tup in k_smallest])

We maintain a heap of the smallest elements. After we pop off the smallest, we need to check all if its downset (tuples that differ in exactly one position), because those tuples might be the next smallest element.

We see that we look through k - 1 times sorting O(n) elements each time with a key that itself is O(n). Because of the key this should make the sort take O(n^2) instead of O(n log n). The heapq is lazy and so popping from it is actually O(k). The initial sorting and preparation should be O(n) as well. Overall I think this makes everything O(k n^2).

like image 20
Kyle Parsons Avatar answered Nov 14 '22 21:11

Kyle Parsons