Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use cumsum within a group in Pandas?

I have

df = pd.DataFrame.from_dict({'id': ['A', 'B', 'A', 'C', 'D', 'B', 'C'], 'val': [1,2,-3,1,5,6,-2], 'stuff':['12','23232','13','1234','3235','3236','732323']})    id   stuff  val 0  A      12    1 1  B   23232    2 2  A      13   -3 3  C    1234    1 4  D    3235    5 5  B    3236    6 6  C  732323   -2 

I'd like to get running some of val for each id, so the desired output looks like this:

  id   stuff  val  cumsum 0  A      12    1   1 1  B   23232    2   2 2  A      13   -3   -2 3  C    1234    1   1 4  D    3235    5   5 5  B    3236    6   8 6  C  732323   -2  -1 

This is what I tried:

df['cumsum'] = df.groupby('id').cumsum(['val']) 

and

df['cumsum'] = df.groupby('id').cumsum(['val']) 

This is the error I got:

ValueError: Wrong number of items passed 0, placement implies 1 
like image 950
Baron Yugovich Avatar asked Sep 29 '15 15:09

Baron Yugovich


People also ask

How do you get Cumsum in pandas?

Pandas DataFrame cumsum() Method The cumsum() method goes through the values in the DataFrame, from the top, row by row, adding the values with the value from the previous row, ending up with a DataFrame where the last row contains the sum of all values for each column.

Can you sort a Groupby object?

Sort within Groups of groupby() Result in DataFrameBy using DataFrame. sort_values() , you can sort DataFrame in ascending or descending order, before you use this first group the DataFrame rows by using DataFrame. groupby() method. Note that groupby preserves the order of rows within each group.

Can you Groupby index in pandas?

How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.

How do you count in Groupby pandas?

Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well.


1 Answers

You can call transform and pass the cumsum function to add that column to your df:

In [156]: df['cumsum'] = df.groupby('id')['val'].transform(pd.Series.cumsum) df  Out[156]:   id   stuff  val  cumsum 0  A      12    1       1 1  B   23232    2       2 2  A      13   -3      -2 3  C    1234    1       1 4  D    3235    5       5 5  B    3236    6       8 6  C  732323   -2      -1 

With respect to your error, you can't call cumsum on a Series groupby object, secondly you're passing the name of the column as a list which is meaningless.

So this works:

In [159]: df.groupby('id')['val'].cumsum()  Out[159]: 0    1 1    2 2   -2 3    1 4    5 5    8 6   -1 dtype: int64 
like image 71
EdChum Avatar answered Oct 03 '22 00:10

EdChum