Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python data structure recommendation?

Tags:

python

I currently have a structure that is a dict: each value is a list that contains numeric values. Each of these numeric lists contain what (to borrow a SQL idiom) you could call a primary key containing the first three values which are: a year, a player identifier, and a team identifier. This is the key for the dict.

So you can get a unique row by passing the a value in for the year, player ID, and team ID like so:

statline = stats[(2001, 'SEA', 'suzukic01')]

Which yields something like

[305, 20, 444, 330, 45]

I'd like to alter this data structure to be quickly summed by either of these three keys: so you could easily slice the totals for a given index in the numeric lists by passing in ONE of year, player ID, and team ID, and then the index. I want to be able to do something like

hr_total = stats[year=2001, idx=3]

Where that idx of 3 corresponds to the third column in the numeric list(s) that would be retrieved.

Any ideas?

like image 264
Wells Avatar asked Jan 23 '23 00:01

Wells


2 Answers

Read up on Data Warehousing. Any book.

Read up on Star Schema Design. Any book. Seriously.

You have several dimensions: Year, Player, Team.

You have one fact: score

You want to have a structure like this.

You then want to create a set of dimension indexes like this.

years = collections.defaultdict( list )
players = collections.defaultdict( list )
teams = collections.defaultdict( list )

Your fact table can be this a collections.namedtuple. You can use something like this.

class ScoreFact( object ):
    def __init__( self, year, player, team, score ):
        self.year= year
        self.player= player
        self.team= team
        self.score= score
        years[self.year].append( self )
        players[self.player].append( self )
        teams[self.team].append( self )

Now you can find all items in a given dimension value. It's a simple list attached to a dimension value.

years['2001'] are all scores for the given year.

players['SEA'] are all scores for the given player.

etc. You can simply use sum() to add them up. A multi-dimensional query is something like this.

[ x for x in players['SEA'] if x.year == '2001' ]
like image 71
S.Lott Avatar answered Jan 25 '23 22:01

S.Lott


Put your data into SQLite, and use its relational engine to do the work. You can create an in-memory database and not even have to touch the disk.

like image 26
Ned Batchelder Avatar answered Jan 25 '23 22:01

Ned Batchelder