Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe: Group by two columns and then average over another column

Assuming that I have a dataframe with the following values:

df:
col1    col2    value
1       2       3
1       2       1
2       3       1

I want to first groupby my dataframe based on the first two columns (col1 and col2) and then average over values of the thirs column (value). So the desired output would look like this:

col1    col2    avg-value
1       2       2
2       3       1

I am using the following code:

columns = ['col1','col2','avg']
df = pd.DataFrame(columns=columns)
df.loc[0] = [1,2,3]
df.loc[1] = [1,3,3]
print(df[['col1','col2','avg']].groupby('col1','col2').mean())

which gets the following error:

ValueError: No axis named col2 for object type <class 'pandas.core.frame.DataFrame'>

Any help would be much appreciated.

like image 748
ahajib Avatar asked Feb 23 '16 20:02

ahajib


People also ask

Can you Groupby 2 columns in pandas?

Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.

How do you average multiple columns in pandas?

To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.

Can we use group by with 2 columns?

Yes, it is possible to use MySQL GROUP BY clause with multiple columns just as we can use MySQL DISTINCT clause.

How do you find mean of a column grouped by another column in pandas?

To calculate mean values grouped on another column in pandas, we will use groupby, and then we will apply mean() method. Pandas allow us a direct method called mean() which calculates the average of the set passed into it.


2 Answers

You need to pass a list of the columns to groupby, what you passed was interpreted as the axis param which is why it raised an error:

In [30]:
columns = ['col1','col2','avg']
df = pd.DataFrame(columns=columns)
df.loc[0] = [1,2,3]
df.loc[1] = [1,3,3]

print(df[['col1','col2','avg']].groupby(['col1','col2']).mean())
           avg
col1 col2     
1    2       3
     3       3
like image 111
EdChum Avatar answered Oct 23 '22 22:10

EdChum


If you want to group by multiple columns, you should put them in a list:

columns = ['col1','col2','value']
df = pd.DataFrame(columns=columns)
df.loc[0] = [1,2,3]
df.loc[1] = [1,3,3]
df.loc[2] = [2,3,1]
print(df.groupby(['col1','col2']).mean())

Or slightly more verbose, for the sake of getting the word 'avg' in your aggregated dataframe:

import numpy as np
columns = ['col1','col2','value']
df = pd.DataFrame(columns=columns)
df.loc[0] = [1,2,3]
df.loc[1] = [1,3,3]
df.loc[2] = [2,3,1]
print(df.groupby(['col1','col2']).agg({'value': {'avg': np.mean}}))
like image 32
jkokorian Avatar answered Oct 23 '22 22:10

jkokorian