Difference to group mean in a pandas data frame?

Tags:

Lets's assume I count how many oranges (Orange) and apples (Apple) people (id) eat in a certain time period. I also know if they are young or old (group). The pandas dataframe would maybe look like this:

df = pd.DataFrame({'id' : ['1','2','3','7'],
                   'group' : ['Young', 'Young', 'Old', 'Old'],
                       'Apple' : [7,2,5,4],
                       'Orange' : [3,6,4,4],
                       })

We can easily compute means using groupby(). E. g:

df.Apple.groupby(df.group).mean()

outputs

Old      4.5
Young    4.5

But let's say, I want to find how much the amount of apples and oranges consumed differs to the group mean per individual?

That is, the output should be

df = pd.DataFrame({'id' : ['1','2','3','7'],
                   'group' : ['Young', 'Young', 'Old', 'Old'],
                       'Apple' : [7,2,5,4],
                       'Orange' : [3,6,4,4],
                       'Apple Difference' : [2.5, -2.5, 0.5, -0.5],
                       })

Is there a way to do this with pandas/numpy? Sorry for the rockie-question Best /R

788

asked Jul 24 '17 14:07

Rachel

1 Answers

You need transform for mean with same length as df and substract by sub:

print (df.groupby('group')['Apple'].transform('mean'))
0    4.5
1    4.5
2    4.5
3    4.5
Name: Apple, dtype: float64

df = pd.DataFrame({'id' : ['1','2','3','7'],
                   'group' : ['Young', 'Young', 'Old', 'Old'],
                       'Apple' : [7,2,5,4],
                       'Orange' : [3,6,4,4],
                       })
df['Apple Difference'] = df['Apple'].sub(df.groupby('group')['Apple'].transform('mean'))
print (df)
   Apple  Orange  group id  Apple Difference
0      7       3  Young  1               2.5
1      2       6  Young  2              -2.5
2      5       4    Old  3               0.5
3      4       4    Old  7              -0.5

answered Sep 21 '22 06:09

jezrael

Related questions
                            
                                Python: estimate Pi with trig functions as efficiently as possible
                            
                                Recreating decision-boundary plot in python with scikit-learn and matplotlib
                            
                                How to get a element to stick to the bottom-right corner in Tkinter?
                            
                                Why doesn't this if statement execute? [closed]
                            
                                Elastic beanstalk require python 3.5
                            
                                Filter object error in Python 3
                            
                                Tensorflow Variables are Not Initialized using Between-graph Replication
                            
                                Using a list of conditions to filter a DataFrame in Pandas
                            
                                Set Unicode filename in Flask response header
                            
                                Opening Image file from url with PIL for text recognition with pytesseract
                            
                                Using British National Corpus in NLTK
                            
                                AttributeError: module 'plotly' has no attribute 'plot'
                            
                                xarray equivalent to pandas subtract/add
                            
                                how to import a directory as python module
                            
                                No module named 'requests_toolbelt'
                            
                                Python: Why is the multiprocessing lock shared among processes here?
                            
                                How to plot the confusion/similarity matrix of a K-mean algorithm
                            
                                How can I check if a string contains a number between two brackets and return the location?
                            
                                Unable to extract shape_predictor_68_face_landmarks.dat for bz
                            
                                eval fails in list comprehension [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference to group mean in a pandas data frame?

Tags:

python-3.x

pandas

difference

pandas-groupby

mean

Rachel

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us