Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference to group mean in a pandas data frame?

Lets's assume I count how many oranges (Orange) and apples (Apple) people (id) eat in a certain time period. I also know if they are young or old (group). The pandas dataframe would maybe look like this:

df = pd.DataFrame({'id' : ['1','2','3','7'],
                   'group' : ['Young', 'Young', 'Old', 'Old'],
                       'Apple' : [7,2,5,4],
                       'Orange' : [3,6,4,4],
                       })

We can easily compute means using groupby(). E. g:

df.Apple.groupby(df.group).mean()

outputs

Old      4.5
Young    4.5

But let's say, I want to find how much the amount of apples and oranges consumed differs to the group mean per individual?

That is, the output should be

df = pd.DataFrame({'id' : ['1','2','3','7'],
                   'group' : ['Young', 'Young', 'Old', 'Old'],
                       'Apple' : [7,2,5,4],
                       'Orange' : [3,6,4,4],
                       'Apple Difference' : [2.5, -2.5, 0.5, -0.5],
                       })

Is there a way to do this with pandas/numpy? Sorry for the rockie-question Best /R

like image 788
Rachel Avatar asked Jul 24 '17 14:07

Rachel


People also ask

What does Groupby mean do?

The groupby() involves a combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

How do you use Groupby in pandas?

The Hello, World! of pandas GroupBy You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to .

What does Groupby return pandas?

Returns a groupby object that contains information about the groups. Convenience method for frequency conversion and resampling of time series. See the user guide for more detailed usage and examples, including splitting an object into groups, iterating through groups, selecting a group, aggregation, and more.


1 Answers

You need transform for mean with same length as df and substract by sub:

print (df.groupby('group')['Apple'].transform('mean'))
0    4.5
1    4.5
2    4.5
3    4.5
Name: Apple, dtype: float64

df = pd.DataFrame({'id' : ['1','2','3','7'],
                   'group' : ['Young', 'Young', 'Old', 'Old'],
                       'Apple' : [7,2,5,4],
                       'Orange' : [3,6,4,4],
                       })
df['Apple Difference'] = df['Apple'].sub(df.groupby('group')['Apple'].transform('mean'))
print (df)
   Apple  Orange  group id  Apple Difference
0      7       3  Young  1               2.5
1      2       6  Young  2              -2.5
2      5       4    Old  3               0.5
3      4       4    Old  7              -0.5
like image 79
jezrael Avatar answered Sep 21 '22 06:09

jezrael