Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Complex time-series statistical aggregation involving polymorphic associations

Ok. Bear with me, as I need to provide a lot of contextual detail before I can solicit a reasonable answer to my question.

I have a site that allows you to make daily stock picks. The way it works is that you're prompted to make picks between companies that are facing-off for the day. For example, GE vs. IBM. You can make two types of picks: Performance (which stock will perform better?) and Total Volume (will the combined stocks trade at volumes higher or lower than X?). You're given 100 virtual dollars each day to make picks.

Ultimately, our goal here is to track which user makes the most money per pick in various categories (explained below) over the following periods of time: 5 days, 15 days, 30 days, 90 days, 180 days, 1 year, all-time. It's very simple to calculate how much money is made per pick. It's the total money made (or lost) / number of picks.

Now, each company that the user makes a pick on falls under a categorical hierarchy. Generically, the categorical hierarchy looks like this:

Division --> Major Group --> Industry Group --> Classification --> Company

Here are some examples:

  • Mining --> Metal Mining --> Iron Ores --> Brown Ore Mining --> Company A
  • Mining --> Metal Mining --> Iron Ores --> Brown Ore Mining --> Company B
  • Mining --> Metal Mining --> Iron Ores --> Limonite Mining --> Company C
  • Mining --> Metal Mining --> Iron Ores --> Limonite Mining --> Company D
  • Manufacturing --> Tobacco Products --> Cigars --> Stogies --> Company E
  • Manufacturing --> Tobacco Products --> Cigars --> Stogies --> Company F
  • Manufacturing --> Tobacco Products --> Cigars --> Cigarillos --> Company G
  • Manufacturing --> Tobacco Products --> Cigars --> Cigarillos --> Company H
  • …and so on…

There's a model for each category (and corresponding table, of course), and they are associated (think foreign_key) just like you see above.

There is a model for Matchup, with each record representing which companies are facing-off for the day. Each record keeps track of the starting and final stock prices for each company, as well as the the total trade volume.

Each Matchup has one or more :pick_prices that can change throughout the day. Normally, each matchup has a Performance Pick Price and a Total Volume Pick Price. The price determines what the pick will cost you and how much you earn for a correct pick. (Now, this is all just background info. You don't need to worry about those particular price calculations.)

At the end of the trading day, the user's picks are resolved. Picks are represented in a Pick model, with the following attributes:

  • user_id
  • amount_spent (e.g., $10)
  • result (e.g., WON, LOST)
  • pick (e.g., company A)
  • matchup_id
  • pick_price_id
  • amount_won
  • resolved (true or false)
  • created_at
  • updated_at

Currently, when each pick is resolved, another table is updated called pick_records, which has the following attributes:

  • user_id
  • recordable_id
  • recordable_type (Division or Major Group or Industry Group or Classification or Company)
  • picks (total picks made, regardless of pick type)
  • won (total picks won, regardless of pick type)
  • lost (total picks lost, regardless of pick type)
  • money (total money won)
  • money_per_pick (money / picks)
  • performance_picks
  • performance_won
  • performance_lost
  • performance_money
  • performance_money_per_pick
  • volume_picks
  • volume_won
  • volume_lost
  • volume_money
  • volume_money_per_pick
  • created_at
  • updated_at

As you can tell, this is a polymorphic model. The table aggregates the all-time pick record statistics.

So now here's the challenge:

Given the existing design, what do I have to do so that I can capture the user's pick records over the following periods of time: 5 days, 15 days, 30 days, 90 days, 180 days, 1 year, all-time? It needs to be simple, efficient, and fast!

I'm currently running Rails 2.3.11 on MySQL DB.

like image 908
keruilin Avatar asked Jun 04 '11 19:06

keruilin


1 Answers

I don't see the need for table pick_records.
You can do a query like this for any number of days:

SELECT 
   user_id
   ,sum(amount_spent) 
   ,sum(IF(result = 'WON',1,0)) as WON_count
   ,sum(IF(result = 'LOST',1,0)) as LOST_count
   ,pick 
   /*matchup_id*/
   ,sum(pc.price) as price
   ,sum(IF(result = 'WON'),amount_won,0)) as amount_won
   ,sum(IF(result = 'LOST'),amount_won,0)) as amount_lost
   ,sum(IF(result = 'WON'),amount_won,-amount_won)) as nett_amount
FROM picks
INNER JOIN pick_price pc ON (pc.id = user.pick_price_id)
WHERE created_at BETWEEN DATE_SUB(NOW(), INTERVAL 5 DAY) AND NOW()
  AND resolved = 'true'
GROUP BY user_id, pick
like image 80
Johan Avatar answered Sep 27 '22 18:09

Johan