I want to calculate the running sum in a given column(without using loops, of course). The caveat is that I have this other column that specifies when to reset the running sum to the value present in that row. Best explained by the following example:
reset val desired_col 0 0 1 1 1 0 5 6 2 0 4 10 3 1 2 2 4 1 -1 -1 5 0 6 5 6 0 4 9 7 1 2 2
desired_col
is the value I want to be calculated.
Pandas Series: cumsum() function The cumsum() function is used to get cumulative sum over a DataFrame or Series axis. Returns a DataFrame or Series of the same size containing the cumulative sum. The index or the name of the axis. 0 is equivalent to None or 'index'.
Pandas DataFrame sum() MethodThe sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.
2.7 Drop Rows that has NaN/None/Null Values By using df. dropna() you can remove NaN values from DataFrame. This removes all rows that have None, Null & NaN values on any columns.
You can use 2 times cumsum()
:
# reset val desired_col #0 0 1 1 #1 0 5 6 #2 0 4 10 #3 1 2 2 #4 1 -1 -1 #5 0 6 5 #6 0 4 9 #7 1 2 2 df['cumsum'] = df['reset'].cumsum() #cumulative sums of groups to column des df['des']= df.groupby(['cumsum'])['val'].cumsum() print df # reset val desired_col cumsum des #0 0 1 1 0 1 #1 0 5 6 0 6 #2 0 4 10 0 10 #3 1 2 2 1 2 #4 1 -1 -1 2 -1 #5 0 6 5 2 5 #6 0 4 9 2 9 #7 1 2 2 3 2 #remove columns desired_col and cumsum df = df.drop(['desired_col', 'cumsum'], axis=1) print df # reset val des #0 0 1 1 #1 0 5 6 #2 0 4 10 #3 1 2 2 #4 1 -1 -1 #5 0 6 5 #6 0 4 9 #7 1 2 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With