I have a pandas DataFrame which is of the form :
A B C D
A1 6 7.5 NaN
A1 4 23.8 <D1 0.0 6.5 12 4, D2 1.0 4 3.5 1>
A2 7 11.9 <D1 2.0 7.5 10 2, D3 7.5 4.2 13.5 4>
A3 11 0.8 <D2 2.0 7.5 10 2, D3 7.5 4.2 13.5 4, D4 2.0 7.5 10 2, D5 7.5 4.2 13.5 4>
The column D is a raw-string column with multiple categories in each entry. The value of entry is calculated by dividing the last two values for each category. For example, in 2nd row :
D1 = 12/4 = 3
D2 = 3.5/1 = 3.5
I need to split column D based on it's categories and join them to my DataFrame. The problem is the column is dynamic and can have nearly 35-40 categories within a single entry. For now, all I'm doing is a Brute Force Approach by iterating all rows, which is very slow for large datasets. Can someone please help me?
EXPECTED OUTCOME
A B C D1 D2 D3 D4 D5
A1 6 7.5 NaN NaN NaN NaN NaN
A1 4 23.8 3.0 3.5 NaN NaN NaN
A2 7 11.9 5.0 NaN 3.4 NaN NaN
A3 11 0.8 NaN 5.0 3.4 5.0 3.4
Use:
d = df['D'].str.extractall(r'(D\d+).*?([\d.]+)\s([\d.]+)(?:,|\>)')
d = d.droplevel(1).set_index(0, append=True).astype(float)
d = df.join(d[1].div(d[2]).round(1).unstack()).drop('D', 1)
Details:
Use Series.str.extractall
to extract all the capture groups from the column D
as specified by the regex
pattern. You can test the regex
pattern here
.
print(d)
0 1 2 # --> capture groups
match
1 0 D1 12 4
1 D2 3.5 1
2 0 D1 10 2
1 D3 13.5 4
3 0 D2 10 2
1 D3 13.5 4
2 D4 10 2
3 D5 13.5 4
Use DataFrame.droplevel
+ set_index
with optional parameter append=True
to drop the unused level and append a new index to datafarme.
print(d)
1 2
0
1 D1 12.0 4.0
D2 3.5 1.0
2 D1 10.0 2.0
D3 13.5 4.0
3 D2 10.0 2.0
D3 13.5 4.0
D4 10.0 2.0
D5 13.5 4.0
Use Series.div
to divide column 1
by 2
and use Series.round
to round the values then use Series.unstack
to reshape the dataframe, then using DataFrame.join
join the new dataframe with df
print(d)
A B C D1 D2 D3 D4 D5
0 A1 6 7.5 NaN NaN NaN NaN NaN
1 A1 4 23.8 3.0 3.5 NaN NaN NaN
2 A2 7 11.9 5.0 NaN 3.4 NaN NaN
3 A3 11 0.8 NaN 5.0 3.4 5.0 3.4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With