I can insert a column into a dataframe that z-scores another column like this:
[1] df.insert(<loc>, column='ZofA', value=(df['A']-df['A'].mean())/df['A'].std())
I can do a simple reduction of a column grouped by 2 other columns like this:
[2] df.groupby(['C1', 'C2'])['A'].mean()
I tried to replace the simple mean() function in [2] with the zscore function in [1], but couldn't figure out how to do it, including with .apply -- e.g. this fails:
[3] df.groupby(['C1', 'C2']).apply((df['A']-df['A'].mean())/df['A'].std())
So my first problem is I apparently don't know how to create a zscore column with grouping.
My second problem is that I want to combine (1) inserting a new column into a dataframe ('ZofA') that holds z-scores from another column ('A'), with (2) having those zscores be calculated within groups defined by two other columns('C1', 'C2'). And (3) I'd like to do all this inside one df.insert() statement. Am I just messing up my parentheses and brackets and what-not, or am I trying to do too much in one statement? Thanks!
Thanks for the pointer to the documentation. For any who are curious, I thought I'd post the solution. First, put the zscore calculation into a lambda:
zscore = lambda x: (x - x.mean()) / x.std()
The magic ingredient is .transform. Just write the insert statement like this:
df.insert(<loc>, 'ZofA', df.groupby(['C1', 'C2'])['A'].transform(zscore))
The solution is indeed in the "Group By: split-apply-combine" document. You just have to scroll down about halfway to the "Transformation" section. I ignored the stuff about the date key and just plugged my grouping columns directly into the groupby statement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With