Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract multiple substring matching pattern into columns

My pandas dataframe has string like this

A=1;B=3;C=c6
A=2;C=c7;D=8

I want to extract the value in each field into separate columns, and then use the field name as columns like this

A    B    C    D
1    3    c6   NaN
2    NaN  c7   8

I tried split df.str.split('=|;', expand=True) but it splits both the value and field as separated columns

I also tried using df.str.extract(r'=\s*([^\.]*)\s*\;', expand=True) but it only return the first occurrence of the values.

Thank you for your help

like image 885
Thanh Nguyen Avatar asked Jan 18 '26 21:01

Thanh Nguyen


2 Answers

I think here is possible use split in list comprehension - first by ; and then by = and convert it to dictionary, so last is possible use DataFrame constructor:

print (df)
            col
0  A=1;B=3;C=c6
1  A=2;C=c7;D=8

L = [dict([y.split('=') for y in x.split(';')]) for x in df['col']]

df = pd.DataFrame(L)
print (df)
   A    B   C    D
0  1    3  c6  NaN
1  2  NaN  c7    8

Detail:

print (L)
[{'A': '1', 'B': '3', 'C': 'c6'}, {'A': '2', 'C': 'c7', 'D': '8'}]
like image 119
jezrael Avatar answered Jan 20 '26 11:01

jezrael


Here is the regex way...

df = pd.DataFrame(dict(re.findall(r'(\w+)=(\w+)', x)) for x in df['col'])
like image 29
nicholishen Avatar answered Jan 20 '26 10:01

nicholishen