Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe - running sum with reset

I want to calculate the running sum in a given column(without using loops, of course). The caveat is that I have this other column that specifies when to reset the running sum to the value present in that row. Best explained by the following example:

   reset  val   desired_col 0      0    1   1 1      0    5   6 2      0    4   10 3      1    2   2 4      1   -1   -1 5      0    6   5 6      0    4   9 7      1    2   2 

desired_col is the value I want to be calculated.

like image 922
Baron Yugovich Avatar asked Oct 01 '15 14:10

Baron Yugovich


People also ask

How do you do a cumulative sum in pandas?

Pandas Series: cumsum() function The cumsum() function is used to get cumulative sum over a DataFrame or Series axis. Returns a DataFrame or Series of the same size containing the cumulative sum. The index or the name of the axis. 0 is equivalent to None or 'index'.

How do you sum all values in a DataFrame pandas?

Pandas DataFrame sum() MethodThe sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.

How do you clear all values in a data frame?

2.7 Drop Rows that has NaN/None/Null Values By using df. dropna() you can remove NaN values from DataFrame. This removes all rows that have None, Null & NaN values on any columns.


1 Answers

You can use 2 times cumsum():

#   reset  val  desired_col #0      0    1            1 #1      0    5            6 #2      0    4           10 #3      1    2            2 #4      1   -1           -1 #5      0    6            5 #6      0    4            9 #7      1    2            2 df['cumsum'] = df['reset'].cumsum() #cumulative sums of groups to column des df['des']= df.groupby(['cumsum'])['val'].cumsum() print df #   reset  val  desired_col  cumsum  des #0      0    1            1       0    1 #1      0    5            6       0    6 #2      0    4           10       0   10 #3      1    2            2       1    2 #4      1   -1           -1       2   -1 #5      0    6            5       2    5 #6      0    4            9       2    9 #7      1    2            2       3    2 #remove columns desired_col and cumsum df = df.drop(['desired_col', 'cumsum'], axis=1) print df #   reset  val  des #0      0    1    1 #1      0    5    6 #2      0    4   10 #3      1    2    2 #4      1   -1   -1 #5      0    6    5 #6      0    4    9 #7      1    2    2 
like image 137
jezrael Avatar answered Sep 21 '22 06:09

jezrael