Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: How to assign sum() or mean() to df.groupby inside a function?

I'd like to wrap df.groupby(pd.TimeGrouper(freq='M')).sum() in a function so that I can assign sum(), mean() or count() as arguments in that function. I've asked a similar question earlier here, but I don't think I can use the same technique in this particular case.

Here is a snippet with reproducible input:

# Imports
import pandas as pd
import numpy as np

# Dataframe with 1 or zero
# 100 rows and 4 columns
# Indexed by dates
np.random.seed(12345678)
df = pd.DataFrame(np.random.randint(0,2,size=(100, 4)), columns=list('ABCD'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=100).tolist()
df['dates'] = datelist 
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
print(df.head(10))

Which gives:

enter image description here

With this we can do:

df2 = df.groupby(pd.TimeGrouper(freq='M')).sum()
print(df2)

And get:

enter image description here

Or we can do:

df3 = df.groupby(pd.TimeGrouper(freq='M')).mean()
print(df3)

And get:

enter image description here

Here's part of the procedure wrapped into a function:

# My function
def function1(df):
    df = df.groupby(pd.TimeGrouper(freq='M')).sum()
    return df

# Function1 call
df4 = function1(df = df)
print(df4)

And that works just fine:

enter image description here

The problem occurs when I try to add sum() or mean() as an argument in Function2, like this:

# My function with sum() as an argument
def function2(df, fun):
    df = df.groupby(pd.TimeGrouper(freq='M')).fun
    return df

My first attempt raises a TypeError:

# Function2 test 1
df5 = function2(df = df, fun = sum())

enter image description here

My second attempt raises an attribute error:

# Function2 test 2
df6 = function2(df = df, fun = 'sum()')

enter image description here

Is it possible to make a few adjustments to this setup to get it working? (I tried another version with 'M' as an argument for freq, and that worked just fine). Or is this just not the way these things are done?

Thank you for any suggestions!

Here is the whole mess for an easy copy&paste:

#%%

# Imports
import pandas as pd
import numpy as np

# Dataframe with 1 or zero
# 100 rows across 4 columns
# Indexed by dates
np.random.seed(12345678)
df = pd.DataFrame(np.random.randint(0,2,size=(100, 4)), columns=list('ABCD'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=100).tolist()
df['dates'] = datelist 
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
print(df.head(10))

# Calculate sum per month
df2 = df.groupby(pd.TimeGrouper(freq='M')).sum()
print(df2)

# Or calculate average per month
df3 = df.groupby(pd.TimeGrouper(freq='M')).mean()
print(df3)

# My function
def function1(df):
    df = df.groupby(pd.TimeGrouper(freq='M')).sum()
    return df

# Function1 test
df4 = function1(df = df)
print(df4)
# So far so good
#%%
# My function with sum() as argument
def function2(df, fun):
    print(fun)
    df = df.groupby(pd.TimeGrouper(freq='M')).fun
    return df

# Function2 test 1
# df5 = function2(df = df, fun = sum())

# Function2 test 2
# df6 = function2(df = df, fun = 'sum()')

# Function2 test 3
# df7 = function2(df = df, fun = sum)
like image 489
vestland Avatar asked Aug 08 '17 14:08

vestland


People also ask

How do you sum with Groupby pandas?

Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.

How do I create a new column from the output of pandas Groupby () sum ()?

To create a new column for the output of groupby. sum(), we will first apply the groupby. sim() operation and then we will store this result in a new column.

How does the Groupby () method works in pandas?

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups. Used to determine the groups for the groupby.

What does sum () do in pandas?

The sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.


1 Answers

you need to use apply

def function2(df, fun):
    return df.groupby(pd.TimeGrouper(freq='M')).apply(fun)

Just make sure fun is a callable that takes a pd.DataFrame


However, you should probably use agg. If fun reduces columns to a scalar similar to sum or mean, then this should work. Something to consider.

df.groupby(pd.TimeGrouper('M')).agg(['sum', 'mean', fun])
like image 58
piRSquared Avatar answered Oct 19 '22 23:10

piRSquared