I am looking for an algorithm to find the best dimension combination to accomplish a desired result.
Take the following as example:
| A | B | C | y |
|--------|--------|-------|-----|
| dog | house1 | green | 30 |
| dog | house1 | blue | 15 |
| cat | house1 | green | 20 |
| cat | house2 | red | 5 |
| turtle | house3 | green | 50 |
A, B, C are the measured dimensions. y is the measured result.
If I want to get all combinations of dimensions that accomplish y >= 50 so the results will be:
turtle, house3, green
turtle, any, green
turtle, house3, any
turtle, any, any
any, house3, green
any, house3, any
any, any, green
any, house1, green
any, house1, any
Maybe it's a easy problem but I was trying to figure an optimal solution in terms of O(n) and I didn't found it.
Start with a work queue containing (any, any, ..., any), 0
. The elements of this queue will be pairs consisting of a combination and a number of elements on the left that cannot be changed from any
(this will make more sense shortly). Until the work queue is empty, remove one element from it and compute the corresponding sum. If it doesn't meet the threshold, then discard it. Otherwise, report it as one of the sought combinations. For each any
that can be changed, for each value in that column, enqueue a combination consisting of the current one with any
replaced by that value, with the index locking down all previous any
values.
Considering an output-sensitive bound, this is within a polynomial factor of optimal (in general, there can be exponentially many combinations).
In Python 3:
def overthreshold(data, threshold):
queue = [(('any',) * len(data[0][0]), 0)]
for combination, begin in queue:
if sum(row[1] for row in data
if all(x in {'any', y}
for x, y in zip(combination, row[0]))) < threshold:
continue
yield combination
for i in range(begin, len(combination)):
if combination[i] == 'any':
queue.extend((combination[:i] + (x,) + combination[i+1:], i + 1)
for x in {row[0][i] for row in data})
def demo():
data = [
(('dog', 'house1', 'green'), 30),
(('dog', 'house1', 'blue'), 15),
(('cat', 'house1', 'green'), 20),
(('cat', 'house2', 'red'), 5),
(('turtle', 'house3', 'green'), 50),
]
for combination in overthreshold(data, 50):
print(combination)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With