My pandas dataframe has string like this
A=1;B=3;C=c6
A=2;C=c7;D=8
I want to extract the value in each field into separate columns, and then use the field name as columns like this
A B C D
1 3 c6 NaN
2 NaN c7 8
I tried split df.str.split('=|;', expand=True) but it splits both the value and field as separated columns
I also tried using df.str.extract(r'=\s*([^\.]*)\s*\;', expand=True) but it only return the first occurrence of the values.
Thank you for your help
I think here is possible use split in list comprehension - first by ; and then by = and convert it to dictionary, so last is possible use DataFrame constructor:
print (df)
col
0 A=1;B=3;C=c6
1 A=2;C=c7;D=8
L = [dict([y.split('=') for y in x.split(';')]) for x in df['col']]
df = pd.DataFrame(L)
print (df)
A B C D
0 1 3 c6 NaN
1 2 NaN c7 8
Detail:
print (L)
[{'A': '1', 'B': '3', 'C': 'c6'}, {'A': '2', 'C': 'c7', 'D': '8'}]
Here is the regex way...
df = pd.DataFrame(dict(re.findall(r'(\w+)=(\w+)', x)) for x in df['col'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With