I have a dataframe that contains multiple "stacks" and their corresponding "lengths".
df = pd.DataFrame({'stack-1-material': ['rock', 'paper', 'paper', 'scissors', 'rock'], 'stack-2-material': ['rock', 'paper', 'rock', 'paper', 'scissors'], 'stack-1-length': [3, 1, 1, 2, 3], 'stack-2-length': [3, 1, 3, 1, 2]})
stack-1-material stack-2-material stack-1-length stack-2-length
0 rock rock 3 3
1 paper paper 1 1
2 paper rock 1 3
3 scissors paper 2 1
4 rock scissors 3 2
I am trying to create a separate column for each material that tracks the cumulative sum of the length regardless of which "stack" they're. I've tried using groupby but am only able to get the cumulative sum into a single column. Here is what I'm looking for:
stack-1-material stack-2-material stack-1-length stack-2-length rock_cumsum paper_cumsum scissors_cumsum
0 rock rock 3 3 6 0 0
1 paper paper 1 1 6 2 0
2 paper rock 1 3 9 3 0
3 scissors paper 2 1 9 4 2
4 rock scissors 3 2 12 4 4
you can use the columns materials as mask on the columns length, then sum along the column and cumsum, for each material.
#separate material and length
material = df.filter(like='material').to_numpy()
lentgh = df.filter(like='length')
# get all unique material
l_mat = np.unique(material)
# iterate over nique materials
for mat in l_mat:
df[f'{mat}_cumsum'] = lentgh.where(material==mat).sum(axis=1).cumsum()
print(df)
stack-1-material stack-2-material stack-1-length stack-2-length \
0 rock rock 3 3
1 paper paper 1 1
2 paper rock 1 3
3 scissors paper 2 1
4 rock scissors 3 2
rock_cumsum paper_cumsum scissors_cumsum
0 6.0 0.0 0.0
1 6.0 2.0 0.0
2 9.0 3.0 0.0
3 9.0 4.0 2.0
4 12.0 4.0 4.0
First, reverse your column names that way we can use wide_to_long to reshape the DataFrame.
Then take the cumsum within material and determine the max value per material per row. We can then reshape this and ffill and replace the remaining NaN with 0s and join back to the original.
df.columns = ['-'.join(x[::-1]) for x in df.columns.str.rsplit('-', n=1)]
res = (pd.wide_to_long(df.reset_index(), stubnames=['material', 'length'],
i='index', j='whatever', suffix='.*')
.sort_index(level=0))
# material length
#index whatever
#0 -stack-1 rock 3
# -stack-2 rock 3
#1 -stack-1 paper 1
# -stack-2 paper 1
#2 -stack-1 paper 1
# -stack-2 rock 3
#3 -stack-1 scissors 2
# -stack-2 paper 1
#4 -stack-1 rock 3
# -stack-2 scissors 2
res['csum'] = res.groupby('material')['length'].cumsum()
res = (res.groupby(['index', 'material'])['csum'].max()
.unstack(-1).ffill().fillna(0, downcast='infer')
.add_suffix('_cumsum'))
df = pd.concat([df, res], axis=1)
material-stack-1 material-stack-2 length-stack-1 length-stack-2 paper_cumsum rock_cumsum scissors_cumsum
0 rock rock 3 3 0 6 0
1 paper paper 1 1 2 6 0
2 paper rock 1 3 3 9 0
3 scissors paper 2 1 4 9 2
4 rock scissors 3 2 4 12 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With