I'd like to wrap <code>df.groupby(pd.TimeGrouper(freq='M')).sum()</code> in a function so that I can assign <code>sum()</code>, <code>mean()</code> or <code>count()</code> as arguments in that function. I've asked a similar question earlier here, but I don't think I can use the same technique in this particular case. Here is a snippet with reproducible input: <pre class="prettyprint"><code># Imports import pandas as pd import numpy as np # Dataframe with 1 or zero # 100 rows and 4 columns # Indexed by dates np.random.seed(12345678) df = pd.DataFrame(np.random.randint(0,2,size=(100, 4)), columns=list('ABCD')) datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=100).tolist() df['dates'] = datelist df = df.set_index(['dates']) df.index = pd.to_datetime(df.index) print(df.head(10)) </code></pre> Which gives: <img src="https://i.stack.imgur.com/kEbbG.png" alt="enter image description here"> With this we can do: <pre class="prettyprint"><code>df2 = df.groupby(pd.TimeGrouper(freq='M')).sum() print(df2) </code></pre> And get: <img src="https://i.stack.imgur.com/wutg8.png" alt="enter image description here"> Or we can do: <pre class="prettyprint"><code>df3 = df.groupby(pd.TimeGrouper(freq='M')).mean() print(df3) </code></pre> And get: <img src="https://i.stack.imgur.com/vVvGC.png" alt="enter image description here"> Here's part of the procedure wrapped into a function: <pre class="prettyprint"><code># My function def function1(df): df = df.groupby(pd.TimeGrouper(freq='M')).sum() return df # Function1 call df4 = function1(df = df) print(df4) </code></pre> And that works just fine: <img src="https://i.stack.imgur.com/Zkkmz.png" alt="enter image description here"> The problem occurs when I try to add <code>sum()</code> or <code>mean()</code> as an argument in Function2, like this: <pre class="prettyprint"><code># My function with sum() as an argument def function2(df, fun): df = df.groupby(pd.TimeGrouper(freq='M')).fun return df </code></pre> My first attempt raises a TypeError: <pre class="prettyprint"><code># Function2 test 1 df5 = function2(df = df, fun = sum()) </code></pre> <img src="https://i.stack.imgur.com/YFomP.png" alt="enter image description here"> My second attempt raises an attribute error: <pre class="prettyprint"><code># Function2 test 2 df6 = function2(df = df, fun = 'sum()') </code></pre> <img src="https://i.stack.imgur.com/vAOpb.png" alt="enter image description here"> Is it possible to make a few adjustments to this setup to get it working? (I tried another version with 'M' as an argument for freq, and that worked just fine). Or is this just not the way these things are done? Thank you for any suggestions! Here is the whole mess for an easy copy&paste: <pre class="prettyprint"><code>#%% # Imports import pandas as pd import numpy as np # Dataframe with 1 or zero # 100 rows across 4 columns # Indexed by dates np.random.seed(12345678) df = pd.DataFrame(np.random.randint(0,2,size=(100, 4)), columns=list('ABCD')) datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=100).tolist() df['dates'] = datelist df = df.set_index(['dates']) df.index = pd.to_datetime(df.index) print(df.head(10)) # Calculate sum per month df2 = df.groupby(pd.TimeGrouper(freq='M')).sum() print(df2) # Or calculate average per month df3 = df.groupby(pd.TimeGrouper(freq='M')).mean() print(df3) # My function def function1(df): df = df.groupby(pd.TimeGrouper(freq='M')).sum() return df # Function1 test df4 = function1(df = df) print(df4) # So far so good #%% # My function with sum() as argument def function2(df, fun): print(fun) df = df.groupby(pd.TimeGrouper(freq='M')).fun return df # Function2 test 1 # df5 = function2(df = df, fun = sum()) # Function2 test 2 # df6 = function2(df = df, fun = 'sum()') # Function2 test 3 # df7 = function2(df = df, fun = sum) </code></pre>

you need to use <code>apply</code> <pre class="prettyprint"><code>def function2(df, fun): return df.groupby(pd.TimeGrouper(freq='M')).apply(fun) </code></pre> Just make sure <code>fun</code> is a callable that takes a <code>pd.DataFrame</code> <hr> However, you should probably use <code>agg</code>. If <code>fun</code> reduces columns to a scalar similar to <code>sum</code> or <code>mean</code>, then this should work. Something to consider. <pre class="prettyprint"><code>df.groupby(pd.TimeGrouper('M')).agg(['sum', 'mean', fun]) </code></pre>

Pandas: How to assign sum() or mean() to df.groupby inside a function?

I'd like to wrap df.groupby(pd.TimeGrouper(freq='M')).sum() in a function so that I can assign sum(), mean() or count() as arguments in that function. I've asked a similar question earlier here, but I don't think I can use the same technique in this particular case.

Here is a snippet with reproducible input:

# Imports
import pandas as pd
import numpy as np

# Dataframe with 1 or zero
# 100 rows and 4 columns
# Indexed by dates
np.random.seed(12345678)
df = pd.DataFrame(np.random.randint(0,2,size=(100, 4)), columns=list('ABCD'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=100).tolist()
df['dates'] = datelist 
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
print(df.head(10))

Which gives:

enter image description here

With this we can do:

df2 = df.groupby(pd.TimeGrouper(freq='M')).sum()
print(df2)

And get:

enter image description here

Or we can do:

df3 = df.groupby(pd.TimeGrouper(freq='M')).mean()
print(df3)

And get:

enter image description here

Here's part of the procedure wrapped into a function:

# My function
def function1(df):
    df = df.groupby(pd.TimeGrouper(freq='M')).sum()
    return df

# Function1 call
df4 = function1(df = df)
print(df4)

And that works just fine:

enter image description here

The problem occurs when I try to add sum() or mean() as an argument in Function2, like this:

# My function with sum() as an argument
def function2(df, fun):
    df = df.groupby(pd.TimeGrouper(freq='M')).fun
    return df

My first attempt raises a TypeError:

# Function2 test 1
df5 = function2(df = df, fun = sum())

enter image description here

My second attempt raises an attribute error:

# Function2 test 2
df6 = function2(df = df, fun = 'sum()')

enter image description here

Is it possible to make a few adjustments to this setup to get it working? (I tried another version with 'M' as an argument for freq, and that worked just fine). Or is this just not the way these things are done?

Thank you for any suggestions!

Here is the whole mess for an easy copy&paste:

#%%

# Imports
import pandas as pd
import numpy as np

# Dataframe with 1 or zero
# 100 rows across 4 columns
# Indexed by dates
np.random.seed(12345678)
df = pd.DataFrame(np.random.randint(0,2,size=(100, 4)), columns=list('ABCD'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=100).tolist()
df['dates'] = datelist 
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
print(df.head(10))

# Calculate sum per month
df2 = df.groupby(pd.TimeGrouper(freq='M')).sum()
print(df2)

# Or calculate average per month
df3 = df.groupby(pd.TimeGrouper(freq='M')).mean()
print(df3)

# My function
def function1(df):
    df = df.groupby(pd.TimeGrouper(freq='M')).sum()
    return df

# Function1 test
df4 = function1(df = df)
print(df4)
# So far so good
#%%
# My function with sum() as argument
def function2(df, fun):
    print(fun)
    df = df.groupby(pd.TimeGrouper(freq='M')).fun
    return df

# Function2 test 1
# df5 = function2(df = df, fun = sum())

# Function2 test 2
# df6 = function2(df = df, fun = 'sum()')

# Function2 test 3
# df7 = function2(df = df, fun = sum)

How do you sum with Groupby pandas?

Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.

How do I create a new column from the output of pandas Groupby () sum ()?

To create a new column for the output of groupby. sum(), we will first apply the groupby. sim() operation and then we will store this result in a new column.

How does the Groupby () method works in pandas?

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups. Used to determine the groups for the groupby.

What does sum () do in pandas?

The sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.

you need to use apply

def function2(df, fun):
    return df.groupby(pd.TimeGrouper(freq='M')).apply(fun)

Just make sure fun is a callable that takes a pd.DataFrame

However, you should probably use agg. If fun reduces columns to a scalar similar to sum or mean, then this should work. Something to consider.

df.groupby(pd.TimeGrouper('M')).agg(['sum', 'mean', fun])

Pandas: How to assign sum() or mean() to df.groupby inside a function?

Tags:

python

function

pandas

vestland

People also ask

1 Answers

piRSquared

Recent Activity

Donate For Us

Pandas: How to assign sum() or mean() to df.groupby inside a function?

Tags:

python

function

pandas

vestland

People also ask

1 Answers

piRSquared

Related questions

Recent Activity

Donate For Us