Normally in a heapsort you build a heap of the entire data set (100,000 elements in your case). Instead, only allow the heap to grow to 20,000 elements. Keep the largest element at the top of the heap.
Quick sort is the better suited for large data sets. [8]It is the fastest and efficient algorithm for large sets of data. But it is inefficient if the elements in the list are already sorted which results in the worst case time complexity of O(n2).
Quicksort. Quicksort is one of the most efficient sorting algorithms, and this makes of it one of the most used as well. The first thing to do is to select a pivot number, this number will separate the data, on its left are the numbers smaller than it and the greater numbers on the right.
But since it has the upper hand in the average cases for most inputs, Quicksort is generally considered the “fastest” sorting algorithm.
You can use the A* algorithm for this. As the heuristic function, take the cost of the players already selected plus the maximum cost of the remaining players needed to complete the team.
Here's some Python code. First, the setup:
def split(team):
return [(x.strip(), y.strip(), float(z)) for x, y, z
in (line.split(",") for line in team.splitlines())]
team1 = ("Toronto",
split("""Player 1, SS, 1.0
Player 2, 1B, 1.0
Player 3, 1B, 2.0
Player 4, 2B, 2.0
Player 5, 3B, 4.0
Player 6, 3B, 3.0
Player 7, C, 4.0
Player 8, OF, 1.0
Player 9, OF, 2.0
Player 10, OF, 5.0
Player 11, OF, 6.0"""))
team2 = ("Washington",
split("""Player 1, SS, 3.0
Player 2, 1B, 2.0
Player 3, 1B, 1.0
Player 4, 2B, 2.0
Player 5, 3B, 2.0
Player 6, 3B, 3.0
Player 7, C, 7.0
Player 8, OF, 1.0
Player 9, OF, 2.0
Player 10, OF, 2.0
Player 11, OF, 3.0"""))
teams = [team1, team2]
We also need a function to tell whether a certain combination is valid. We do not need to check that no player appears twice, as they will be unique by construction, so we just have how often each of the player positions appear.
ALLOWED = {"1B": 1, "2B": 1, "3B": 1, "SS": 1, "C": 1, "OF": 3}
def legal(players):
d = {}
for p in players:
d[p[1]] = d.get(p[1], 0) + 1
return all(d[x] <= ALLOWED[x] for x in d)
Now for the interesting part: The A* search. We are using a heap, or priority queue, each entry in the heap being a tuple (estimated_cost, players, position)
, so the (partial) team with the highest total estimated cost will be first in the heap (we use -cost
, as the heap will be sorted from least to greatest). pos
tells us our current position in the list of players, sorted by individual cost, highest first.
We then pop the next element from the heap, check whether it's a solution, and either yield
the result1), or add another player to the team, updating the estimated cost with the cost of the current team, filled up with the next-expensive players from the bulk of remaining players (new_players + team_players[i+1:i+more_needed]
), and add that combination to the heap.
def generator(team_players, num):
team_players = sorted(team_players, key=lambda p: p[2], reverse=True)
queue = [(-sum(p[2] for p in team_players[:num]), [], 0)]
while queue:
estimated_cost, players, pos = heapq.heappop(queue)
more_needed = num - len(players)
if not more_needed:
yield estimated_cost, players
else:
for i in range(pos, len(team_players) - more_needed + 1):
player = team_players[i]
new_players = players + [player]
if legal(new_players):
estimate = -sum(p[2] for p in new_players + team_players[i+1:i+more_needed])
heapq.heappush(queue, (estimate, new_players, i+1))
Finally, we have a second generator function, mapping the above generators to team name and generating the next-best solution for each team (if any) one after another. Another heap is used to store the currently best solutions for each team, sorted by total cost.
def generate_all(teams):
generators = {name: generator(players, 4) for name, players in teams}
best_teams = [(next(generators[name]), name) for name in generators]
heapq.heapify(best_teams)
while best_teams:
team, name = heapq.heappop(best_teams)
yield name, team
try:
heapq.heappush(best_teams, (next(generators[name]), name))
except StopIteration:
pass
Now just generate and print the first few from that generator:
all_teams_generator = generate_all(teams)
for _ in range(10):
print(next(all_teams_generator))
Output in format (team-name, (-cost, [(player1), ..., (player4)]))
:
('Toronto', (-19.0, [('Player 11', 'OF', 6.0), ('Player 10', 'OF', 5.0), ('Player 5', '3B', 4.0), ('Player 7', 'C', 4.0)]))
('Toronto', (-18.0, [('Player 11', 'OF', 6.0), ('Player 10', 'OF', 5.0), ('Player 7', 'C', 4.0), ('Player 6', '3B', 3.0)]))
('Toronto', (-17.0, [('Player 11', 'OF', 6.0), ('Player 10', 'OF', 5.0), ('Player 5', '3B', 4.0), ('Player 3', '1B', 2.0)]))
('Toronto', (-17.0, [('Player 11', 'OF', 6.0), ('Player 10', 'OF', 5.0), ('Player 5', '3B', 4.0), ('Player 4', '2B', 2.0)]))
('Toronto', (-17.0, [('Player 11', 'OF', 6.0), ('Player 10', 'OF', 5.0), ('Player 5', '3B', 4.0), ('Player 9', 'OF', 2.0)]))
('Toronto', (-17.0, [('Player 11', 'OF', 6.0), ('Player 10', 'OF', 5.0), ('Player 7', 'C', 4.0), ('Player 3', '1B', 2.0)]))
('Toronto', (-17.0, [('Player 11', 'OF', 6.0), ('Player 10', 'OF', 5.0), ('Player 7', 'C', 4.0), ('Player 4', '2B', 2.0)]))
('Toronto', (-17.0, [('Player 11', 'OF', 6.0), ('Player 10', 'OF', 5.0), ('Player 7', 'C', 4.0), ('Player 9', 'OF', 2.0)]))
('Toronto', (-16.0, [('Player 11', 'OF', 6.0), ('Player 10', 'OF', 5.0), ('Player 5', '3B', 4.0), ('Player 1', 'SS', 1.0)]))
('Toronto', (-16.0, [('Player 11', 'OF', 6.0), ('Player 10', 'OF', 5.0), ('Player 5', '3B', 4.0), ('Player 2', '1B', 1.0)]))
Addendum: On Performance. For small teams, generating all the combinations and picking the best ones, as in Peter's answer, is probably indeed "good enough", and certainly much simpler. However, this version, using A*, can be a good deal faster and might be worth a try if you have larger teams, or a greater number of different teams, or if you have to do this very often. Here's some timing information, generating the top ten teams using both algorithms.
>>> timeit.timeit("allcomb.main(10)", "import allcomb", number=1000)
6.20834589005
>>> timeit.timeit("astar.main(10)", "import astar", number=1000)
1.55367398262
Note that this is for 1000 iterations, so for a single iteration, both algorithms will need just a few milliseconds. Also note that due to the generative nature of the A*, it is much faster the fewer combinations you need. For just the best combination, it's six times faster, and for the top ten, it's four times faster, but if you want all the combinations anyway, the A* is in fact three times slower than just generating all the combinations in the first place.
1) Note that this is using generator functions, which are pretty unique to Python, but you can do the same using a class encapsulating the queue as a member and providing a method, returning the next valid solution. If you are interested, I could also rewrite it that way.
You can sort players in decreasing order in each team and then iterate throw sorted list. Code can look something like this:
int highestCombinedValue = 0, i = 0, currentNumber = 0, count1B = 0, count2B = 0, cpunt3B = 0, countSS =0, countC = 0, countOF = 0;
while(currentNumber < 4 && list.get(i) != null)
{
currentPlayer = list.get(i);
switch currentPlayer.position
{
case: 1B
then if(count1B < 1)
{
highestCombinedValue = highestCombinedValue + currentPlayer.value;
count1B = count1B + 1;
}
break;
case: 2B
...
case: OF
then if(count1B < 3)
{
highestCombinedValue = highestCombinedValue + currentPlayer.value;
count1B = count1B + 1;
}
break;
}
i = i + 1;
}
You can store team with result in list and sort this result list in decreasing order to obtain top teams.
I think this is not as complicated as you think. Use a bit of mathematical inference to simplify your problem.
Sort each team in decreasing score value. A quick and easy way is to put the team members into a max heap.
To find the highest scoring combination in each team, pop the heap 4 times and add up the values of those 4 nodes. This is your max combination.
Worst case complexity is O(n^2).
To generalize this to top n, pop top 3, sum, then for each item left, pop that item, sum it, and stick in another max heap to keep this list sorted. Continue down the list by repeating the same for top-1 node in the original heap.
Given a team size of 20, there are only 20*19*18*17/(4*3*2*1) = 4845
combinations of 4 players from the team (not all of which are legal) so you should be able to simply brute force all possible team combinations, sort, and select the highest scoring ones.
Here is some example Python code that does this for your example:
import itertools
def f(s):
players = []
for x in s.splitlines():
player,pos,score = x.split(',')
score=float(score)
pos=pos.strip()
players.append( (player,pos,score) )
return players
teams={}
teams['Toronto'] = f("""Player 1, SS, 1.0
Player 2, 1B, 1.0
Player 3, 1B, 2.0
Player 4, 2B, 2.0
Player 5, 3B, 4.0
Player 6, 3B, 3.0
Player 7, C, 4.0
Player 8, OF, 1.0
Player 9, OF, 2.0
Player 10, OF, 5.0
Player 11, OF, 6.0""")
teams['Washington'] = f("""Player 1, SS, 3.0
Player 2, 1B, 2.0
Player 3, 1B, 1.0
Player 4, 2B, 2.0
Player 5, 3B, 2.0
Player 6, 3B, 3.0
Player 7, C, 7.0
Player 8, OF, 1.0
Player 9, OF, 2.0
Player 10, OF, 2.0
Player 11, OF, 3.0""")
def legal(comb):
"""Test if this is a legal team combination"""
count_of = 0
S = set()
for player,pos,score in comb:
if pos == 'OF':
count_of += 1
elif pos in S:
return False
else:
S.add(pos)
return count_of <= 3
A = [] # List of legal combinations
for team_name,players in teams.items():
for comb in itertools.combinations(players,4):
if legal(comb):
score = sum(p[2] for p in comb)
A.append( (score,comb,team_name) )
A.sort(reverse = True)
for score,comb,team_name in A[:2]:
print score,team_name,comb
Finding all combinations within each team as suggested by @Peter de Rivaz is certainly the easiest to do correctly. I included Ruby code as a baseline for testing something more sophisticated.
But it turns out you can apply a simple greedy algorithm to this problem. So the A* and other search propositions are overkill. Hill climbing is enough. I'm still not sure it's worth the trouble with your data size. I won't have time to code it. That wouldn't be very hard, but there's significant bookkeeping, and edge cases (like players and combinations with exactly equal values) must be handled with care.
The argument for greediness goes like this:
Proposition: Let
C = c_0, c_1, ..., c_N-1
be the combinations of 4 players arranged in decreasing order of total value. Then for each1 <= i < N
, we can say thatc_i
can be formed by taking somec_j
,0 <= j < i
(a combination of higher value) and substituting exactly one of its four elements P with a different player Q of minimally lower value. By minimally lower value we mean there is no player with the same position as Q with higher value that is still lower than P's.The proof by contradiction is tedious, but pretty simple and I won't give it.
How to turn this into an algorithm? The trick is that if we have any combination [A,B,C,D], the set of possible "next" combinations for the same team in sorted order is small. Just consider replacing each of the combo elements in U=[A,B,C,D] in succession with all possible "legal" players having minimally lower value. E.g, let next(P, pos) be the unique player with position pos having highest value lower than P. Let O be the set of all positions not in {pos(A),pos(B),pos(C),pos(D)}. Then the possible "successor" combinations in sorted order after [A,B,C,D] is just
succ(U) = next(A, pos(A)) union_(o\in O).next(A, o) union
next(B, pos(B)) union_(o\in O).next(B, o) union
next(C, pos(C)) union_(o\in O).next(C, o) union
next(D, pos(D)) union_(o\in O).next(D, o)
So succ(U) contains a small finite number of elements. The algorithm just starts with the max value combination for each team, puts it in a priority queue, then repeatedly pulls a combination off the queue (this is the output), finds its possible successors, and adds them to the queue. It keeps a "visited" set to prevent adding an element multiple times.
It's possible to show - though I won't do it here - that the heap will have at most O(n^3) elements on it, where n is the number of players on a team. So the run time will be O(log n) per combination taken out of the heap. You can stop any time you like.
Here is pseudocode:
Let H be a priority queue with add and deleteMax operations.
for each team t
P = sequence of players of t sorted in decreasing order of value
Let Cmax = {} # the maximum value combination for team t.
for each player p in P
if adding p to Cmax doesn't make it an invalid combo,
add p to Cmax
if |Cmax| == 4, break
end for
# Cmax is now the max value combination for t
add Cmax to H
end for
# H is now initialized with the max val player of each team
Let V = {} # A set of visited combinations
while H is not empty
let c = H.deleteMax
output c
Add succ(c) - V to H
V = V union succ(c)
}
Sorry I don't have time right now to code this. It would be fun.
Here's the "brute force" Ruby code:
Player = Struct.new('Player', :team, :name, :pos, :val)
# The algorithm.
def sorted_combos(data)
data.map{|team| team.combination(4).select{|g| combination_ok? g } }
.flatten(1).sort_by{|x| x.map(&:val).inject(&:+) }.reverse!
end
# Checking that a particular combination meets the membership rules.
def combination_ok?(g)
not_of = g.map(&:pos).select{|x| x != 'OF'}
not_of.size >= 1 && not_of.uniq.size == not_of.size
end
# Test.
def parse(teams)
teams.map{|t| t[1].split("\n").map{|x| x.split(/,\s*/) }
.map{|x| Player.new(t[0], x[0], x[1], x[2].to_f) } }
end
DATA = [
['Toronto',
'Player 1, SS, 1.0
Player 2, 1B, 1.0
Player 3, 1B, 2.0
Player 4, 2B, 2.0
Player 5, 3B, 4.0
Player 6, 3B, 3.0
Player 7, C, 4.0
Player 8, OF, 1.0
Player 9, OF, 2.0
Player 10, OF, 5.0
Player 11, OF, 6.0'
],
['Washington',
'Player 1, SS, 3.0
Player 2, 1B, 2.0
Player 3, 1B, 1.0
Player 4, 2B, 2.0
Player 5, 3B, 2.0
Player 6, 3B, 3.0
Player 7, C, 7.0
Player 8, OF, 1.0
Player 9, OF, 2.0
Player 10, OF, 2.0
Player 11, OF, 3.0'
],
]
print sorted_combos(parse(DATA))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With