colum 1, colum2
a,b,c 30
b,c,f 40
a,g,z 50
.
.
.
Using above dataframe with col1,2 I'd like to have dataframe as below dataframe with col3, 4.
Additionally, col1 consists of values with commas. col4 consists of sum of col2 following col3.
column3, column4
a 80
b 70
c 70
f 40
g 50
z 50
Use:
df = (df.set_index('colum2')['colum1']
.str.split(',', expand=True)
.stack()
.reset_index(name='column3')
.groupby('column3', as_index=False)['colum2']
.sum()
.rename(columns={'colum2':'column4'})
)
print (df)
column3 column4
0 a 80
1 b 70
2 c 70
3 f 40
4 g 50
5 z 50
Explanation:
set_index
by column colum2
DataFrame
by split
stack
reset_index
groupby
and aggregate sum
Another solution:
from itertools import chain
a = df['colum1'].str.split(',')
lens = a.str.len()
df = pd.DataFrame({
'column3' : list(chain.from_iterable(a)),
'column4' : df['colum2'].repeat(lens)
}).groupby('column3', as_index=False)['column4'].sum()
print (df)
column3 column4
0 a 80
1 b 70
2 c 70
3 f 40
4 g 50
5 z 50
Explanation:
split
len
repeat
columns and flatten colum1
groupby
and aggregate sum
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With