Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Excel Solver in Python [closed]

Tags:

python

I'm trying to implement something like this

http://office.microsoft.com/en-us/excel-help/using-solver-to-rate-sports-teams-HA001124601.aspx

in python with python libraries only (not calling Excel solver).

Can someone point me to the right libraries to be using + some dive-in tutorials to get started ?

like image 655
haha Avatar asked Jan 08 '11 14:01

haha


2 Answers

You are looking for NumPy (matrix manipulation and number-crunching) and SciPy (optimization). To get started, see https://stackoverflow.com/questions/4375094/numpy-learning-resources

I worked out the given example as follows:

  • I opened the sample Excel files in OpenOffice
  • I copied the team-data (without headers) to a new sheet and saved as teams.csv
  • I copied the game-data (without headers) to a new sheet and saved as games.csv

then in Python:

import csv
import numpy
import scipy.optimize

def readCsvFile(fname):
    with open(fname, 'r') as inf:
        return list(csv.reader(inf))

# Get team data
team = readCsvFile('teams.csv')  # list of num,name
numTeams = len(team)

# Get game data
game = readCsvFile('games.csv')  # list of game,home,away,homescore,awayscore
numGames = len(game)

# Now, we have the NFL teams for 2002 and data on all games played.
# From this, we wish to forecast the score of future games.
# We are going to assume that each team has an inherent performance-factor,
# and that there is a bonus for home-field advantage; then the
# relative final score between a home team and an away team can be
# calculated as (home advantage) + (home team factor) - (away team factor)

# First we create a matrix M which will hold the data on
# who played whom in each game and who had home-field advantage.
m_rows = numTeams + 1
m_cols = numGames
M = numpy.zeros( (m_rows, m_cols) )

# Then we create a vector S which will hold the final
# relative scores for each game.
s_cols = numGames
S = numpy.zeros(s_cols)

# Loading M and S with game data
for col,gamedata in enumerate(game):
    gameNum,home,away,homescore,awayscore = gamedata
    # In the csv data, teams are numbered starting at 1
    # So we let home-team advantage be 'team 0' in our matrix
    M[0, col]         =  1.0   # home team advantage
    M[int(home), col] =  1.0
    M[int(away), col] = -1.0
    S[col]            = int(homescore) - int(awayscore)


# Now, if our theoretical model is correct, we should be able
# to find a performance-factor vector W such that W*M == S
#
# In the real world, we will never find a perfect match,
# so what we are looking for instead is W which results in S'
# such that the least-mean-squares difference between S and S'
# is minimized.

# Initial guess at team weightings:
# 2.0 points home-team advantage, and all teams equally strong
init_W = numpy.array([2.0]+[0.0]*numTeams)  

def errorfn(w,m,s):
    return w.dot(m) - s

W = scipy.optimize.leastsq(errorfn, init_W, args=(M,S))

homeAdvantage = W[0][0]   # 2.2460937500005356
teamStrength = W[0][1:]   # numpy.array([-151.31111318, -136.36319652, ... ])

# Team strengths have meaning only by linear comparison;
# we can add or subtract any constant to all of them without
# changing the meaning.
# To make them easier to understand, we want to shift them
# such that the average is 0.0
teamStrength -= teamStrength.mean()

for t,s in zip(team,teamStrength):
    print "{0:>10}: {1: .7}".format(t[1],s)

results in

       Ari: -9.8897569
       Atl:  5.0581597
      Balt: -2.1178819
      Buff: -0.27413194
  Carolina: -3.2720486
      Chic: -5.2654514
      Cinn: -10.503646
      Clev:  1.2338542
      Dall: -8.4779514
       Den:  4.8901042
       Det: -9.1727431
        GB:  3.5800347
      Hous: -9.4390625
      Indy:  1.1689236
      Jack: -0.2015625
        KC:  6.1112847
     Miami:  6.0588542
      Minn: -3.0092014
        NE:  4.0262153
        NO:  2.4251736
       NYG:  0.82725694
       NYJ:  3.1689236
       Oak:  10.635243
      Phil:  8.2987847
      Pitt:  2.6994792
 St. Louis: -3.3352431
 San Diego: -0.72065972
        SF:  0.63524306
   Seattle: -1.2512153
     Tampa:  8.8019097
      Tenn:  1.7640625
      Wash: -4.4529514

which is the same result shown in the spreadsheet.

like image 85
Hugh Bothwell Avatar answered Oct 11 '22 22:10

Hugh Bothwell


This page lists a number of solver libraries for Python which you might use:

  • Python solvers
like image 35
ars Avatar answered Oct 11 '22 22:10

ars