I have a data frame (df) with column col1 which has many rows and there are rows with a common string (Collection of numbers are) and ending with different numbers(001, 002, 005). I want to extract the rows between two string (Collection of numbers are 002 to Collection of numbers are 003) and assign them to new column with same row name (Collection of numbers are 002)
col1
0 Collection of numbers are 002
1 53
2 20
3 56
4 Collection of numbers are 003
5 236
6 325
7 Collection of numbers are 005
8 96
9 23
10 63
I want to convert the how above data frame to the following format.
0 Collection of numbers are 002 Collection of numbers are 003 Collection of numbers are 005
1 53 236 96
2 20 325 23
3 56 63
Note: No repeated numbers
We could try ffill and some basic resphaping using str.split
df['headers'] = df['col1'].str.extract('(Collection.*)').ffill()
df1 = df[~df['col1'].str.contains('Collection')].copy()
df1.groupby('headers').agg(','.join)['col1'].str.split(',',expand=True).T.rename_axis('',axis='columns')
out:
Collection of numbers are 002 Collection of numbers are 003 \
0 53 236
1 20 325
2 56 None
Collection of numbers are 005
0 96
1 23
2 63
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With