Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Establishing highest score for a set of combinations

I'm programming in python.

I have the data of the following form:

(A, B, C, D, E, F, G, H, I)

Segments of this data are associated with a score, for example:

scores:

    (A, B, C, D) = .99
    (A, B, C, E) = .77
    (A, B, E) = .66
    (G,) = 1
    (I,) = .03
    (H, I) = .55
    (I, H) = .15
    (E, F, G) = .79
    (B,) = .93
    (A, C) = .46
    (D,) = .23
    (D, F, G) = .6
    (F, G, H) = .34
    (H,) = .09
    (Y, Z) = 1

We can get a score for this data as follows:

A B C E + D F G + H I = .77 * .6 * .55 = 0.2541

another possiblity is:

A B C D + E F G + H + I = .99 * .79 * .09 * .03 = 0.00211167

So, the first combination gives the higher score.

I wish to write an algorithm to establish for the data above the highest possible score. The members of data should no be repeated more than once. In other words:

A B C E + E F G + D + H I 

is not valid. How would you recommend I go about solving this?

Thanks,

Barry

EDIT: I should clarify that (H, I) != (I, H) and that (I, H) is not a subsegment for ABCDEFGHI, but is a subsegment of ABIHJ. Another thing I should mention is that scores is a very large set (millions) and the segment on which we're calculating the score has an average length of around 10. Furthermore, the way in which I calculate the score might change in the future. Maybe I'd like to add the subsegments and take an average instead of multipling, who knows... for that reason it might be better to seperate the code which calculates the possible combinations from the actual calculation of the score. At the moment, I'm inclined to think that itertools.combinations might offer a good starting point.

like image 392
Baz Avatar asked Dec 26 '11 21:12

Baz


1 Answers

Brute-forcing, by using recursion (for each segment in order, we recursively find the best score using the segment, and the best score not using the segment. A score of 0 is assigned if there is no possible combination of segments for the remaining items):

segment_scores = (('A', 'B', 'C', 'D'), .99), (('A', 'B', 'C', 'E'), .77) #, ...

def best_score_for(items, segments, subtotal = 1.0):
    if not items: return subtotal
    if not segments: return 0.0
    segment, score = segments[0]
    best_without = best_score_for(items, segments[1:], subtotal)
    return max(
        best_score_for(items.difference(segment), segments[1:], subtotal * score),
        best_without
    ) if items.issuperset(segment) else best_without

best_score_for(set('ABCDEFGHI'), segment_scores) # .430155
like image 152
Karl Knechtel Avatar answered Sep 19 '22 22:09

Karl Knechtel