Lets's assume I count how many oranges (Orange
) and apples (Apple
) people (id
) eat in a certain time period. I also know if they are young or old (group
). The pandas dataframe would maybe look like this:
df = pd.DataFrame({'id' : ['1','2','3','7'],
'group' : ['Young', 'Young', 'Old', 'Old'],
'Apple' : [7,2,5,4],
'Orange' : [3,6,4,4],
})
We can easily compute means using groupby()
. E. g:
df.Apple.groupby(df.group).mean()
outputs
Old 4.5
Young 4.5
But let's say, I want to find how much the amount of apples and oranges consumed differs to the group mean per individual?
That is, the output should be
df = pd.DataFrame({'id' : ['1','2','3','7'],
'group' : ['Young', 'Young', 'Old', 'Old'],
'Apple' : [7,2,5,4],
'Orange' : [3,6,4,4],
'Apple Difference' : [2.5, -2.5, 0.5, -0.5],
})
Is there a way to do this with pandas/numpy? Sorry for the rockie-question Best /R
The groupby() involves a combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
The Hello, World! of pandas GroupBy You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to .
Returns a groupby object that contains information about the groups. Convenience method for frequency conversion and resampling of time series. See the user guide for more detailed usage and examples, including splitting an object into groups, iterating through groups, selecting a group, aggregation, and more.
You need transform
for mean
with same length
as df
and substract by sub
:
print (df.groupby('group')['Apple'].transform('mean'))
0 4.5
1 4.5
2 4.5
3 4.5
Name: Apple, dtype: float64
df = pd.DataFrame({'id' : ['1','2','3','7'],
'group' : ['Young', 'Young', 'Old', 'Old'],
'Apple' : [7,2,5,4],
'Orange' : [3,6,4,4],
})
df['Apple Difference'] = df['Apple'].sub(df.groupby('group')['Apple'].transform('mean'))
print (df)
Apple Orange group id Apple Difference
0 7 3 Young 1 2.5
1 2 6 Young 2 -2.5
2 5 4 Old 3 0.5
3 4 4 Old 7 -0.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With