Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to find best dimensions combination

I am looking for an algorithm to find the best dimension combination to accomplish a desired result.

Take the following as example:

|   A    |    B   |   C   |  y  |
|--------|--------|-------|-----|
| dog    | house1 | green | 30  |
| dog    | house1 | blue  | 15  |
| cat    | house1 | green | 20  |
| cat    | house2 | red   |  5  |
| turtle | house3 | green | 50  |

A, B, C are the measured dimensions. y is the measured result.

If I want to get all combinations of dimensions that accomplish y >= 50 so the results will be:

turtle, house3, green
turtle, any,    green
turtle, house3, any
turtle, any,    any
any,    house3, green
any,    house3, any
any,    any,    green
any,    house1, green
any,    house1, any

Maybe it's a easy problem but I was trying to figure an optimal solution in terms of O(n) and I didn't found it.

like image 868
jrdi Avatar asked Sep 19 '15 10:09

jrdi


1 Answers

Start with a work queue containing (any, any, ..., any), 0. The elements of this queue will be pairs consisting of a combination and a number of elements on the left that cannot be changed from any (this will make more sense shortly). Until the work queue is empty, remove one element from it and compute the corresponding sum. If it doesn't meet the threshold, then discard it. Otherwise, report it as one of the sought combinations. For each any that can be changed, for each value in that column, enqueue a combination consisting of the current one with any replaced by that value, with the index locking down all previous any values.

Considering an output-sensitive bound, this is within a polynomial factor of optimal (in general, there can be exponentially many combinations).

In Python 3:

def overthreshold(data, threshold):
    queue = [(('any',) * len(data[0][0]), 0)]
    for combination, begin in queue:
        if sum(row[1] for row in data
               if all(x in {'any', y}
                      for x, y in zip(combination, row[0]))) < threshold:
            continue
        yield combination
        for i in range(begin, len(combination)):
            if combination[i] == 'any':
                queue.extend((combination[:i] + (x,) + combination[i+1:], i + 1)
                             for x in {row[0][i] for row in data})


def demo():
    data = [
        (('dog',    'house1', 'green'), 30),
        (('dog',    'house1', 'blue'),  15),
        (('cat',    'house1', 'green'), 20),
        (('cat',    'house2', 'red'),    5),
        (('turtle', 'house3', 'green'), 50),
    ]
    for combination in overthreshold(data, 50):
        print(combination)
like image 140
David Eisenstat Avatar answered Oct 31 '22 17:10

David Eisenstat