Conditional Cumulative Sum Based Multiple Pandas Columns

Question

I have a dataframe that contains multiple "stacks" and their corresponding "lengths".

df = pd.DataFrame({'stack-1-material': ['rock', 'paper', 'paper', 'scissors', 'rock'], 'stack-2-material': ['rock', 'paper', 'rock', 'paper', 'scissors'], 'stack-1-length': [3, 1, 1, 2, 3], 'stack-2-length': [3, 1, 3, 1, 2]})

  stack-1-material stack-2-material  stack-1-length  stack-2-length
0             rock             rock               3               3
1            paper            paper               1               1
2            paper             rock               1               3
3         scissors            paper               2               1
4             rock         scissors               3               2

I am trying to create a separate column for each material that tracks the cumulative sum of the length regardless of which "stack" they're. I've tried using groupby but am only able to get the cumulative sum into a single column. Here is what I'm looking for:

  stack-1-material stack-2-material  stack-1-length  stack-2-length  rock_cumsum  paper_cumsum  scissors_cumsum
0             rock             rock               3               3            6             0                0
1            paper            paper               1               1            6             2                0
2            paper             rock               1               3            9             3                0
3         scissors            paper               2               1            9             4                2
4             rock         scissors               3               2           12             4                4

Ben.T · Accepted Answer

you can use the columns materials as mask on the columns length, then sum along the column and cumsum, for each material.

#separate material and length
material = df.filter(like='material').to_numpy()
lentgh = df.filter(like='length')

# get all unique material
l_mat = np.unique(material)

# iterate over nique materials
for mat in l_mat:
    df[f'{mat}_cumsum'] = lentgh.where(material==mat).sum(axis=1).cumsum()

print(df)
  stack-1-material stack-2-material  stack-1-length  stack-2-length  \
0             rock             rock               3               3   
1            paper            paper               1               1   
2            paper             rock               1               3   
3         scissors            paper               2               1   
4             rock         scissors               3               2   

   rock_cumsum  paper_cumsum  scissors_cumsum  
0          6.0           0.0              0.0  
1          6.0           2.0              0.0  
2          9.0           3.0              0.0  
3          9.0           4.0              2.0  
4         12.0           4.0              4.0

ALollz · Answer

First, reverse your column names that way we can use wide_to_long to reshape the DataFrame.

Then take the cumsum within material and determine the max value per material per row. We can then reshape this and ffill and replace the remaining NaN with 0s and join back to the original.

df.columns = ['-'.join(x[::-1]) for x in df.columns.str.rsplit('-', n=1)]

res = (pd.wide_to_long(df.reset_index(), stubnames=['material', 'length'], 
                       i='index', j='whatever', suffix='.*')
         .sort_index(level=0))

#                material  length
#index whatever                  
#0     -stack-1      rock       3
#      -stack-2      rock       3
#1     -stack-1     paper       1
#      -stack-2     paper       1
#2     -stack-1     paper       1
#      -stack-2      rock       3
#3     -stack-1  scissors       2
#      -stack-2     paper       1
#4     -stack-1      rock       3
#      -stack-2  scissors       2

res['csum'] = res.groupby('material')['length'].cumsum()
res = (res.groupby(['index', 'material'])['csum'].max()
          .unstack(-1).ffill().fillna(0, downcast='infer')
          .add_suffix('_cumsum'))

df = pd.concat([df, res], axis=1)

  material-stack-1 material-stack-2  length-stack-1  length-stack-2  paper_cumsum  rock_cumsum  scissors_cumsum
0             rock             rock               3               3             0            6                0
1            paper            paper               1               1             2            6                0
2            paper             rock               1               3             3            9                0
3         scissors            paper               2               1             4            9                2
4             rock         scissors               3               2             4           12                4

Conditional Cumulative Sum Based Multiple Pandas Columns

Tags:

python-3.x

pandas

cumulative-sum

Luxo_Jr

2 Answers

Ben.T

ALollz

Recent Activity

Donate For Us

Conditional Cumulative Sum Based Multiple Pandas Columns

Tags:

python-3.x

pandas

cumulative-sum

Luxo_Jr

2 Answers

Ben.T

ALollz

Related questions

Recent Activity

Donate For Us