Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

adding a grouped-by zscore column to a pandas dataframe

I can insert a column into a dataframe that z-scores another column like this:

[1] df.insert(<loc>, column='ZofA', value=(df['A']-df['A'].mean())/df['A'].std())

I can do a simple reduction of a column grouped by 2 other columns like this:

[2] df.groupby(['C1', 'C2'])['A'].mean()

I tried to replace the simple mean() function in [2] with the zscore function in [1], but couldn't figure out how to do it, including with .apply -- e.g. this fails:

[3] df.groupby(['C1', 'C2']).apply((df['A']-df['A'].mean())/df['A'].std())

So my first problem is I apparently don't know how to create a zscore column with grouping.

My second problem is that I want to combine (1) inserting a new column into a dataframe ('ZofA') that holds z-scores from another column ('A'), with (2) having those zscores be calculated within groups defined by two other columns('C1', 'C2'). And (3) I'd like to do all this inside one df.insert() statement. Am I just messing up my parentheses and brackets and what-not, or am I trying to do too much in one statement? Thanks!

like image 637
steve---g Avatar asked Sep 09 '16 23:09

steve---g


1 Answers

Thanks for the pointer to the documentation. For any who are curious, I thought I'd post the solution. First, put the zscore calculation into a lambda:

zscore = lambda x: (x - x.mean()) / x.std()

The magic ingredient is .transform. Just write the insert statement like this:

df.insert(<loc>, 'ZofA', df.groupby(['C1', 'C2'])['A'].transform(zscore))

The solution is indeed in the "Group By: split-apply-combine" document. You just have to scroll down about halfway to the "Transformation" section. I ignored the stuff about the date key and just plugged my grouping columns directly into the groupby statement.

like image 100
steve---g Avatar answered Nov 23 '22 18:11

steve---g