I have a data set like below:
name status number message
matt active 12345 [job: , money: none, wife: none]
james active 23456 [group: band, wife: yes, money: 10000]
adam inactive 34567 [job: none, money: none, wife: , kids: one, group: jail]
How can I extract the key value pairs, and turn them into a dataframe expanded all the way out?
Expected output:
name status number job money wife group kids
matt active 12345 none none none none none
james active 23456 none 10000 none band none
adam inactive 34567 none none none none one
The message contains multiple different key types.
Any help would be greatly appreciated.
It is not easy.
Need convert values to list
of dict
by replace
(\s+
is one or more whitespaces) and then use ast
.
Then is possible use DataFrame
constructor with concat
, pop
drop column from df
:
import ast
df.message = df.message.replace([':\s+,','\[', '\]', ':\s+', ',\s+'],
['":"none","', '{"', '"}', '":"', '","'], regex=True)
df.message = df.message.apply(ast.literal_eval)
df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
kids money group job money wife
0 NaN none NaN none NaN none
1 NaN NaN band NaN 10000 yes
2 one NaN jail none none none
df = pd.concat([df, df1], axis=1)
print (df)
name status number kids money group job money wife
0 matt active 12345 NaN none NaN none NaN none
1 james active 23456 NaN NaN band NaN 10000 yes
2 adam inactive 34567 one NaN jail none none none
EDIT:
Another solution with yaml
:
import yaml
df.message = df.message.replace(['\[','\]'],['{','}'], regex=True).apply(yaml.load)
df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
group job kids money wife
0 NaN None NaN none none
1 band NaN NaN 10000 True
2 jail none one none None
df = pd.concat([df, df1], axis=1)
print (df)
name status number group job kids money wife
0 matt active 12345 NaN None NaN none none
1 james active 23456 band NaN NaN 10000 True
2 adam inactive 34567 jail none one none None
You labeled it as a list but say its a dictionary so this should work:
pd.concat([data.drop(['message'], axis=1), data['message'].apply(pd.Series)], axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With