I'd like to wrap df.groupby(pd.TimeGrouper(freq='M')).sum()
in a function so that I can assign sum()
, mean()
or count()
as arguments in that function. I've asked a similar question earlier here, but I don't think I can use the same technique in this particular case.
Here is a snippet with reproducible input:
# Imports
import pandas as pd
import numpy as np
# Dataframe with 1 or zero
# 100 rows and 4 columns
# Indexed by dates
np.random.seed(12345678)
df = pd.DataFrame(np.random.randint(0,2,size=(100, 4)), columns=list('ABCD'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=100).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
print(df.head(10))
Which gives:
With this we can do:
df2 = df.groupby(pd.TimeGrouper(freq='M')).sum()
print(df2)
And get:
Or we can do:
df3 = df.groupby(pd.TimeGrouper(freq='M')).mean()
print(df3)
And get:
Here's part of the procedure wrapped into a function:
# My function
def function1(df):
df = df.groupby(pd.TimeGrouper(freq='M')).sum()
return df
# Function1 call
df4 = function1(df = df)
print(df4)
And that works just fine:
The problem occurs when I try to add sum()
or mean()
as an argument in Function2, like this:
# My function with sum() as an argument
def function2(df, fun):
df = df.groupby(pd.TimeGrouper(freq='M')).fun
return df
My first attempt raises a TypeError:
# Function2 test 1
df5 = function2(df = df, fun = sum())
My second attempt raises an attribute error:
# Function2 test 2
df6 = function2(df = df, fun = 'sum()')
Is it possible to make a few adjustments to this setup to get it working? (I tried another version with 'M' as an argument for freq, and that worked just fine). Or is this just not the way these things are done?
Thank you for any suggestions!
Here is the whole mess for an easy copy&paste:
#%%
# Imports
import pandas as pd
import numpy as np
# Dataframe with 1 or zero
# 100 rows across 4 columns
# Indexed by dates
np.random.seed(12345678)
df = pd.DataFrame(np.random.randint(0,2,size=(100, 4)), columns=list('ABCD'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=100).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
print(df.head(10))
# Calculate sum per month
df2 = df.groupby(pd.TimeGrouper(freq='M')).sum()
print(df2)
# Or calculate average per month
df3 = df.groupby(pd.TimeGrouper(freq='M')).mean()
print(df3)
# My function
def function1(df):
df = df.groupby(pd.TimeGrouper(freq='M')).sum()
return df
# Function1 test
df4 = function1(df = df)
print(df4)
# So far so good
#%%
# My function with sum() as argument
def function2(df, fun):
print(fun)
df = df.groupby(pd.TimeGrouper(freq='M')).fun
return df
# Function2 test 1
# df5 = function2(df = df, fun = sum())
# Function2 test 2
# df6 = function2(df = df, fun = 'sum()')
# Function2 test 3
# df7 = function2(df = df, fun = sum)
Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.
To create a new column for the output of groupby. sum(), we will first apply the groupby. sim() operation and then we will store this result in a new column.
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups. Used to determine the groups for the groupby.
The sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.
you need to use apply
def function2(df, fun):
return df.groupby(pd.TimeGrouper(freq='M')).apply(fun)
Just make sure fun
is a callable that takes a pd.DataFrame
However, you should probably use agg
. If fun
reduces columns to a scalar similar to sum
or mean
, then this should work. Something to consider.
df.groupby(pd.TimeGrouper('M')).agg(['sum', 'mean', fun])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With