Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate cumulative moving average in Python/SQLAlchemy/Flask

I'll give some context so it makes sense. I'm capturing Customer Ratings for Products in a table (Rating) and want to be able to return a Cumulative Moving Average of the ratings based on time.

A basic example follows taking a rating per day:

02 FEB - Rating: 5 - Cum Avg: 5
03 FEB - Rating: 4 - Cum Avg: (5+4)/2 = 4.5
04 FEB - Rating: 1 - Cum Avg: (5+4+1)/3 = 3.3
05 FEB - Rating: 5 - Cum Avg: (5+4+1+5)/4 = 3.75
Etc...

I'm trying to think of an approach that won't scale horribly.

My current idea is to have a function that is tripped when a row is inserted into the Rating table that works out the Cum Avg based on the previous row for that product

So the fields would be something like:

TABLE: Rating
| RatingId | DateTime | ProdId | RatingVal | RatingCnt | CumAvg |

But this seems like a fairly dodgy way to store the data.

What would be the (or any) way to accomplish this? If I was to use the 'trigger' of sorts, how do you go about doing that in SQLAlchemy?

Any and all advice appreciated!

like image 343
mal-wan Avatar asked Aug 23 '11 07:08

mal-wan


1 Answers

I don't know about SQLAlchemy, but I might use an approach like this:

  • Store the cumulative average and rating count separately from individual ratings.
  • Every time you get a new rating, update the cumulative average and rating count:
    • new_count = old_count + 1
    • new_average = ((old_average * old_count) + new_rating) / new_count
  • Optionally, store a row for each new rating.

Updating the average and rating count could be done with a single SQL statement.

like image 58
ʇsәɹoɈ Avatar answered Sep 26 '22 23:09

ʇsәɹoɈ