I'm programming in python.
I have the data of the following form:
(A, B, C, D, E, F, G, H, I)
Segments of this data are associated with a score, for example:
scores:
(A, B, C, D) = .99
(A, B, C, E) = .77
(A, B, E) = .66
(G,) = 1
(I,) = .03
(H, I) = .55
(I, H) = .15
(E, F, G) = .79
(B,) = .93
(A, C) = .46
(D,) = .23
(D, F, G) = .6
(F, G, H) = .34
(H,) = .09
(Y, Z) = 1
We can get a score for this data as follows:
A B C E + D F G + H I = .77 * .6 * .55 = 0.2541
another possiblity is:
A B C D + E F G + H + I = .99 * .79 * .09 * .03 = 0.00211167
So, the first combination gives the higher score.
I wish to write an algorithm to establish for the data above the highest possible score. The members of data should no be repeated more than once. In other words:
A B C E + E F G + D + H I
is not valid. How would you recommend I go about solving this?
Thanks,
Barry
EDIT: I should clarify that (H, I) != (I, H) and that (I, H) is not a subsegment for ABCDEFGHI, but is a subsegment of ABIHJ. Another thing I should mention is that scores is a very large set (millions) and the segment on which we're calculating the score has an average length of around 10. Furthermore, the way in which I calculate the score might change in the future. Maybe I'd like to add the subsegments and take an average instead of multipling, who knows... for that reason it might be better to seperate the code which calculates the possible combinations from the actual calculation of the score. At the moment, I'm inclined to think that itertools.combinations might offer a good starting point.
Brute-forcing, by using recursion (for each segment in order, we recursively find the best score using the segment, and the best score not using the segment. A score of 0 is assigned if there is no possible combination of segments for the remaining items):
segment_scores = (('A', 'B', 'C', 'D'), .99), (('A', 'B', 'C', 'E'), .77) #, ...
def best_score_for(items, segments, subtotal = 1.0):
if not items: return subtotal
if not segments: return 0.0
segment, score = segments[0]
best_without = best_score_for(items, segments[1:], subtotal)
return max(
best_score_for(items.difference(segment), segments[1:], subtotal * score),
best_without
) if items.issuperset(segment) else best_without
best_score_for(set('ABCDEFGHI'), segment_scores) # .430155
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With