Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

loop to make every combination of several lists

I have a somewhat complicated, or maybe not complicated, but long question that I'm going to try to distill down to just the basic parts and then maybe I can figure it out from there.

So what I'm trying to do is fill up a baseball roster. I have a list of players, which is then broken into several other lists, based on position. From this point, I want to fill the roster with every possible combination of players. I've written a basic script that accomplishes this, but as I have a large list, simply iterating through every combination is less than ideal. I used a subset of players just to test it, and it still took about an hour and a half.

Some players are eligible to be put in the roster at multiple positions, and as such, some names will be in two or more of the lists that I'm pulling names from. My first time-saving attempt it to check every iteration for whether it will contain a duplicate, and if so, skip that player.

import os, csv, time

player_pool_csv = open('Available Player Pool.csv', 'r') player_pool = csv.reader(player_pool_csv) player_pool = list(player_pool)

roster = ['a','b','c','d','e','f','g','h']
# roster with characters is my solution to the TypeError: unhashable type: 'list' 
# when I check len(roster) against len(set(roster)) in the first loop

catcher_pool = []
first_pool = []
second_pool = []
third_pool = []
short_pool = []
of_pool = []

for player in player_pool:
    if 'C' in player[2]:
        catcher_pool.append(player)
    if '1B' in player[2]:
        first_pool.append(player)
    if '2B' in player[2]:
        second_pool.append(player)
    if '3B' in player[2]:
        third_pool.append(player)
    if 'SS' in player[2]:
        short_pool.append(player)
    if 'OF' in player[2]:
        of_pool.append(player)

start = time.time()

for catcher in catcher_pool:
roster[0] = catcher[0]
if len(roster) != len(set(roster)):
    continue
for first_baseman in first_pool:
    roster[1] = first_baseman[0]
    if len(roster) != len(set(roster)):
        continue
    for second_baseman in second_pool:
        roster[2]= second_baseman[0]
        if len(roster) != len(set(roster)):
            continue
        for third_baseman in third_pool:
            roster[3] = third_baseman[0]
            if len(roster) != len(set(roster)):
                continue
            for shortstop in short_pool:
                roster[4] = shortstop[0]
                if len(roster) != len(set(roster)):
                    continue
                for outfielder1 in of_pool:
                    roster[5] = outfielder1[0]
                    if len(roster) != len(set(roster)):
                        continue
                    for outfielder2 in of_pool:
                        roster[6] = outfielder2[0]
                        if len(roster) != len(set(roster)):
                            continue
                        for outfielder3 in of_pool:
                            roster[7] = outfielder3[0]
                            if len(roster) != len(set(roster)):
                                continue
                            print(roster)

end = time.time()
elapsed = end - start
print(elapsed)

When I run this code, it seems to stop itself during the "for outfielder2" loop at some point. I can't seem to put my finger on exactly why it stops when it does, and how to fix it.

I realize you may need more information than this, like the contents of the pools it's pulling players from, but I don't want to overload the question at first if it turns out not to be necessary. I'll put that stuff in if needed.

Any ideas what I'm doing wrong, and how to make this more efficient? Thanks.

EDIT

Okay I pared my player_pool down to

['Ender Inciarte', '0.283', 'OF', '3900']
['A.J. Pollock', '0.304', 'OF', '4900']
['Jamie Romak', '0.349', 'OF', '2000']
['Adam Jones', '0.258', 'OF', '3700']
['Paul Goldschmidt', '0.343', '1B', '5600']
['Chris Davis', '0.306', '1B', '4300']
['Aaron Hill', '0.245', '2B/3B', '2400']
['Jimmy Paredes', '0.276', '1B/2B', '3000']
['Jake Lamb', '0.283', '3B', '3700']
['Manny Machado', '0.315', '3B', '4400']
['Welington Castillo', '0.31', 'C', '3800']
['Caleb Joseph', '0.266', 'C', '3200']
['Xander Bogaerts', '0.318', '3B/SS', '4300']
['Eugenio Suarez', '0.294', 'SS', '3000']

When I run the code with all of the if len(roster) != len(set(roster)) commented out, it correctly returns every combination (takes 48.3 seconds per the timer I wrote into it). But if I put the if len(roster) != len(set(roster)) back in, this is the output:

['Welington Castillo', 'Paul Goldschmidt', 'Aaron Hill', 'Jake Lamb', 'Xander Bogaerts', 'Ender Inciarte', 'A.J. Pollock', 'Jamie Romak']
['Welington Castillo', 'Paul Goldschmidt', 'Aaron Hill', 'Jake Lamb', 'Xander Bogaerts', 'Ender Inciarte', 'A.J. Pollock', 'Adam Jones']
['Welington Castillo', 'Paul Goldschmidt', 'Aaron Hill', 'Jake Lamb', 'Xander Bogaerts', 'Ender Inciarte', 'Jamie Romak', 'A.J. Pollock']
['Welington Castillo', 'Paul Goldschmidt', 'Aaron Hill', 'Jake Lamb', 'Xander Bogaerts', 'Ender Inciarte', 'Jamie Romak', 'Adam Jones']
0.03500199317932129
like image 446
jbf Avatar asked Aug 14 '15 23:08

jbf


1 Answers

You can optimize a lot by filtering the pools of the inner loops by the players you've already chosen in the outer loops. This is especially easy with sets.
You can also use combinations from itertools for your outerfield players to avoid another three loops.
The result looks like this:

import itertools as it
import time

player_pool = [['Ender Inciarte', '0.283', 'OF', '3900'],
               ['A.J. Pollock', '0.304', 'OF', '4900'],
               ['Jamie Romak', '0.349', 'OF', '2000'],
               ['Adam Jones', '0.258', 'OF', '3700'],
               ['Paul Goldschmidt', '0.343', '1B', '5600'],
               ['Chris Davis', '0.306', '1B', '4300'],
               ['Aaron Hill', '0.245', '2B/3B', '2400'],
               ['Jimmy Paredes', '0.276', '1B/2B', '3000'],
               ['Jake Lamb', '0.283', '3B', '3700'],
               ['Manny Machado', '0.315', '3B', '4400'],
               ['Welington Castillo', '0.31', 'C', '3800'],
               ['Caleb Joseph', '0.266', 'C', '3200'],
               ['Xander Bogaerts', '0.318', '3B/SS', '4300'],
               ['Eugenio Suarez', '0.294', 'SS', '3000']]

# create a player hash for later use, I hope player names are unique
player_hash = {p[0]: p for p in player_pool}

# use sets for the pools so that we can use set difference later on
catcher_pool = set()
first_pool = set()
second_pool = set()
third_pool = set()
short_pool = set()
of_pool = set()

# fill the pools only with player names
for player, stats in player_hash.items():
    if 'C' in stats[2]:
        catcher_pool.add(player)
    if '1B' in stats[2]:
        first_pool.add(player)
    if '2B' in stats[2]:
        second_pool.add(player)
    if '3B' in stats[2]:
        third_pool.add(player)
    if 'SS' in stats[2]:
        short_pool.add(player)
    if 'OF' in stats[2]:
        of_pool.add(player)

start = time.time()
playing = set()
all_rosters = []
roster = [None]*8

# create a little generator that only yields players from a pool
# which are currently not playing
def filtered(pool):
    for player in pool-playing:
        playing.add(player)
        yield player
        playing.remove(player)

with open('rosters.txt', 'w') as f:
    for catcher in filtered(catcher_pool):
        roster[0] = catcher
        for first_baseman in filtered(first_pool):
            roster[1] = first_baseman
            for second_baseman in filtered(second_pool):
                roster[2] = second_baseman
                for third_baseman in filtered(third_pool):
                    roster[3] = third_baseman
                    for shortstop in filtered(short_pool):
                        roster[4] = shortstop
                        for outfielders in it.combinations(of_pool-playing, 3):
                            roster[5:8] = outfielders
                            # append result to a list instead of printing
                            # this is a lot faster
                            all_rosters.append(roster[:])
                            # if our list is bigger than 1e6, write it to file
                            if len(all_rosters) > 10**6:
                                f.writelines(repr(roster)+'\n'
                                             for roster
                                             in all_rosters)
                                all_rosters = []
    # don't forget to write the last batch which did not reach 1e6
    f.writelines(repr(roster)+'\n' for roster in all_rosters)

end = time.time()
elapsed = end - start

# make sure no roster contains double names
assert all(len(set(roster)) == len(roster) == 8 for roster in all_rosters)

# make sure all rosters are unique
assert len(all_rosters) == len(set(tuple(roster) for roster in all_rosters))

print(len(all_rosters))
print(elapsed)

Prints:

[a lot of rosters]
232
0.00205016136169

But the results are not ordered. I gues that isn't a problem. If you want to sort them, just sort all_rosters after they were generated.

like image 182
swenzel Avatar answered Nov 11 '22 11:11

swenzel