I have a dataframe which looks like this: <pre class="prettyprint lang-py prettyprint-override"><code>pd.DataFrame({'category': [1,1,1,2,2,2,3,3,3,4], 'order_start': [1,2,3,1,2,3,1,2,3,1], 'time': [1, 4, 3, 6, 8, 17, 14, 12, 13, 16]}) Out[40]: category order_start time 0 1 1 1 1 1 2 4 2 1 3 3 3 2 1 6 4 2 2 8 5 2 3 17 6 3 1 14 7 3 2 12 8 3 3 13 9 4 1 16 </code></pre> I would like to create a new column which contains the mean of the previous times of the same category. How can I create it ? The new column should look like this: <pre class="prettyprint lang-py prettyprint-override"><code>pd.DataFrame({'category': [1,1,1,2,2,2,3,3,3,4], 'order_start': [1,2,3,1,2,3,1,2,3,1], 'time': [1, 4, 3, 6, 8, 17, 14, 12, 13, 16], 'mean': [np.nan, 1, 2.5, np.nan, 6, 7, np.nan, 14, 13, np.nan]}) Out[41]: category order_start time mean 0 1 1 1 NaN 1 1 2 4 1.0 = 1 / 1 2 1 3 3 2.5 = (4+1)/2 3 2 1 6 NaN 4 2 2 8 6.0 = 6 / 1 5 2 3 17 7.0 = (8+6) / 2 6 3 1 14 NaN 7 3 2 12 14.0 8 3 3 13 13.0 9 4 1 16 NaN </code></pre> Note: If it is the first time, the mean should be NaN. EDIT: as stated by cs95, my question was not really the same as this one since here, expanding is required.

"create a new column which contains the mean of the previous times of the same category" sounds like a good use case for <code>GroupBy.expanding</code> (and a shift): <pre class="prettyprint"><code>df['mean'] = ( df.groupby('category')['time'].apply(lambda x: x.shift().expanding().mean())) df category order_start time mean 0 1 1 1 NaN 1 1 2 4 1.0 2 1 3 3 2.5 3 2 1 6 NaN 4 2 2 8 6.0 5 2 3 17 7.0 6 3 1 14 NaN 7 3 2 12 14.0 8 3 3 13 13.0 9 4 1 16 NaN </code></pre> <hr> Another way to calculate this is without the <code>apply</code> (chaining two <code>groupby</code> calls): <pre class="prettyprint"><code>df['mean'] = ( df.groupby('category')['time'] .shift() .groupby(df['category']) .expanding() .mean() .to_numpy()) # replace to_numpy() with `.values` for pd.__version__ < 0.24 df category order_start time mean 0 1 1 1 NaN 1 1 2 4 1.0 2 1 3 3 2.5 3 2 1 6 NaN 4 2 2 8 6.0 5 2 3 17 7.0 6 3 1 14 NaN 7 3 2 12 14.0 8 3 3 13 13.0 9 4 1 16 NaN </code></pre> In terms of performance, it really depends on the number and size of your groups.

pandas GroupBy and cumulative mean of previous rows in group

I have a dataframe which looks like this:

pd.DataFrame({'category': [1,1,1,2,2,2,3,3,3,4],
              'order_start': [1,2,3,1,2,3,1,2,3,1],
              'time': [1, 4, 3, 6, 8, 17, 14, 12, 13, 16]})
Out[40]: 
   category  order_start  time
0         1            1     1
1         1            2     4
2         1            3     3
3         2            1     6
4         2            2     8
5         2            3    17
6         3            1    14
7         3            2    12
8         3            3    13
9         4            1    16

I would like to create a new column which contains the mean of the previous times of the same category. How can I create it ?

The new column should look like this:

pd.DataFrame({'category': [1,1,1,2,2,2,3,3,3,4],
              'order_start': [1,2,3,1,2,3,1,2,3,1],
              'time': [1, 4, 3, 6, 8, 17, 14, 12, 13, 16],
              'mean': [np.nan, 1, 2.5, np.nan, 6, 7, np.nan, 14, 13, np.nan]})
Out[41]: 
   category  order_start  time  mean
0         1            1     1   NaN
1         1            2     4   1.0    = 1 / 1
2         1            3     3   2.5    = (4+1)/2
3         2            1     6   NaN
4         2            2     8   6.0    = 6 / 1
5         2            3    17   7.0    = (8+6) / 2
6         3            1    14   NaN
7         3            2    12  14.0
8         3            3    13  13.0
9         4            1    16   NaN

Note: If it is the first time, the mean should be NaN.

EDIT: as stated by cs95, my question was not really the same as this one since here, expanding is required.

How to group by mean in pandas Dataframe?

Groupby mean in pandas dataframe python Groupby mean in pandas python can be accomplished by groupby () function. Groupby mean of multiple column and single column in pandas is accomplished by multiple ways some among them are groupby () function and aggregate () function. let’s see how to Groupby single column in pandas – groupby mean

How to group by multiple columns and single column in pandas?

Groupby mean of multiple column and single column in pandas is accomplished by multiple ways some among them are groupby () function and aggregate () function. let’s see how to Groupby single column in pandas – groupby mean Groupby multiple columns in pandas – groupby mean Groupby mean using pivot () function.

How does groupby work in pandas?

Similar to the SQL GROUP BY statement, the Pandas method works by splitting our data, aggregating it in a given way (or ways), and re-combining the data in a meaningful way. Because the .groupby () method works by first splitting the data, we can actually work with the groups directly.

How to group by mean of multiple columns and single column?

Groupby mean of multiple column and single column in pandas is accomplished by multiple ways some among them are groupby () function and aggregate () function. let’s see how to. Groupby single column in pandas – groupby mean. Groupby multiple columns in pandas – groupby mean. Groupby mean using pivot () function.

"create a new column which contains the mean of the previous times of the same category" sounds like a good use case for GroupBy.expanding (and a shift):

df['mean'] = (
    df.groupby('category')['time'].apply(lambda x: x.shift().expanding().mean()))
df
   category  order_start  time  mean
0         1            1     1   NaN
1         1            2     4   1.0
2         1            3     3   2.5
3         2            1     6   NaN
4         2            2     8   6.0
5         2            3    17   7.0
6         3            1    14   NaN
7         3            2    12  14.0
8         3            3    13  13.0
9         4            1    16   NaN

Another way to calculate this is without the apply (chaining two groupby calls):

df['mean'] = (
    df.groupby('category')['time']
      .shift()
      .groupby(df['category'])
      .expanding()
      .mean()
      .to_numpy())  # replace to_numpy() with `.values` for pd.__version__ < 0.24
df
   category  order_start  time  mean
0         1            1     1   NaN
1         1            2     4   1.0
2         1            3     3   2.5
3         2            1     6   NaN
4         2            2     8   6.0
5         2            3    17   7.0
6         3            1    14   NaN
7         3            2    12  14.0
8         3            3    13  13.0
9         4            1    16   NaN

In terms of performance, it really depends on the number and size of your groups.

Inspired by my answer here, one can define a function first:

def mean_previous(df, Category, Order, Var):
    # Order the dataframe first 
    df.sort_values([Category, Order], inplace=True)

    # Calculate the ordinary grouped cumulative sum 
    # and then substract with the grouped cumulative sum of the last order
    csp = df.groupby(Category)[Var].cumsum() - df.groupby([Category, Order])[Var].cumsum()

    # Calculate the ordinary grouped cumulative count 
    # and then substract with the grouped cumulative count of the last order
    ccp = df.groupby(Category)[Var].cumcount() - df.groupby([Category, Order]).cumcount()

    return csp / ccp

And the desired column is

df['mean'] = mean_previous(df, 'category', 'order_start', 'time')

Performance-wise, I believe it's very fast.

pandas GroupBy and cumulative mean of previous rows in group

Tags:

python

pandas

group-by

pandas-groupby

qwertzuiop

People also ask

2 Answers

cs95

Anastasiya-Romanova 秀

Recent Activity

Donate For Us

pandas GroupBy and cumulative mean of previous rows in group

Tags:

python

pandas

group-by

pandas-groupby

qwertzuiop

People also ask

2 Answers

cs95

Anastasiya-Romanova 秀

Related questions

Recent Activity

Donate For Us