Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Minimize Subset Span

I need to select one number from each set in a list of sets of positive integers, such that the difference between the smallest and largest number in the output list is minimized. For example:

numbers = [
  (0, 4, 9),
  (3, 5),
  (7, 8, 9)
]

In this case, the list containing one number from each set and minimizing range is [4, 5, 7], which has a range of 3.

I've thought of a few basic approaches so far, all of which involve starting at the first (top) set and working downwards to build the resulting list. A basic approach seems to be trying each possible path in a depth-first search, while keeping track of the minimum range achieved so far and halting any paths that exceed the current minimum range.

A second approach might be to binary search each set for the numbers closest to the previous number (selected from the last set), selecting the two closest numbers with one greater than and one less than the previous number, and then using a depth-first search on this (significantly smaller) search space. I'm not totally sure this approach would lead to a globally optimal solution though.

I expect this problem can almost certainly be mapped to a well-known search algorithm but I'm having trouble finding the best algorithm. What existing algorithms are well-suited to solve such a problem?

like image 581
B00TK1D Avatar asked Feb 25 '26 04:02

B00TK1D


2 Answers

Here's an idea. Not sure if it is optimal but it should be correct.

  1. sort each set.
  2. pop the smallest item from each set into a minheap
  3. track the max - min of elements in the heap
  4. repeat:
    1. pop min from heap
    2. replace it the next smallest item from the same set (pop min from set again)
    3. if new max-min is less than before, save the current configuration
    4. (if said set is empty, stop.)

of course, you'd insert tuples of (value, parent_set) into the heap so you know which value came from which set once they're popped from the heap and needs to be replaced

import heapq
from heapq import heapreplace


def minimize_range(sets: list[set[int]]) -> list[int]:
    """Finds the minimum range that contains one element from each set."""
    sets = [list(s) for s in sets]  # convert to lists to allow sorting
    for s in sets:
        s.sort(reverse=True)  # reverse so .pop() gives the smallest element

    heap = [(s.pop(), i) for i, s in enumerate(sets)]
    heapq.heapify(heap)

    # to track minimum range
    range_hi = max(v[0] for v in heap)
    min_range = range_hi - heap[0][0]  # smallest item in heap is at index 0
    best_result = heap.copy()

    while True:
        min_val, min_set_idx = heap[0]
        if not sets[min_set_idx]:
            break  # the set is empty, we're done

        # replace the smallest element in the heap with the next element from the same set
        next_val = sets[min_set_idx].pop()
        heapreplace(heap, (next_val, min_set_idx))

        # calculate the new range
        range_hi = max(range_hi, next_val)
        new_range = range_hi - heap[0][0]
        if new_range < min_range:
            min_range = new_range
            best_result = heap.copy()

    return [v[0] for v in best_result]


if __name__ == '__main__':
    numbers = [
        {0, 4, 9},
        {3, 5},
        {7, 8, 9}
    ]
    result = minimize_range(numbers)
    print(result)  # Output: [4, 5, 7]
like image 183
Akioweh Avatar answered Feb 27 '26 18:02

Akioweh


Same idea as Akioweh's, i.e., merging the sorted sets with a heap containing one element from each set. But with two main differences:

  • Instead of copying the entire current heap every time a better one is found, I only remember its low point. That's enough to build the list of selections at the end. In your example I end up with lo = 4 and then build [4, 5, 7] by taking the smallest number ≥4 from each set. This way I only need 0.3 seconds for my input sets = [set(range(50000))] + [{50000}] * 50000 for which Akioweh's took 15 seconds.
  • For fun I use heapq.merge, which is implemented with a heap similarly, and I wrap each sorted set with a tracker that updates the high point. I wouldn't rely on implementation details like that in important code, though.
from heapq import merge


def minimize_range(sets: list[set[int]]) -> list[int]:
    hi = -1e999
    def track_hi(s):
        nonlocal hi
        for x in sorted(s):
            hi = max(hi, x)
            yield x
        hi = 1e999
    lo = min(
        (hi - lo, lo)
        for lo in merge(*map(track_hi, sets))
    )[1]
    return [
        min(x for x in s if x >= lo)
        for s in sets
    ]


if __name__ == '__main__':
    numbers = [
        {0, 4, 9},
        {3, 5},
        {7, 8, 9}
    ]
    result = minimize_range(numbers)
    print(result)  # Output: [4, 5, 7]

Attempt This Online!

like image 39
Kelly Bundy Avatar answered Feb 27 '26 18:02

Kelly Bundy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!