I manually create a DataFrame:
import pandas as pd
df_articles1 = pd.DataFrame({'Id' : [4,5,8,9],
'Class':[
{'encourage': 1, 'contacting': 1},
{'cardinality': 16, 'subClassOf': 3},
{'get-13.5.1': 1},
{'cardinality': 12, 'encourage': 1}
]
})
I export it to a csv file to import after splitting it:
df_articles1.to_csv(f"""{path}articles_split.csv""", index = False, sep=";")
I can split it with pd.json_normalize()
:
df_articles1 = pd.json_normalize(df_articles1['Class'])
I import its csv file to a DataFrame:
df_articles2 = pd.read_csv(f"""{path}articles_split.csv""", sep=";")
But this fails with:
AttributeError: 'str' object has no attribute 'values' pd.json_normalize(df_articles2['Class'])
that was because when you save by to_csv()
the data in your 'Class' column is stored as string
not as dictionary/json
so after loading that saved data:
df_articles2 = pd.read_csv(f"""{path}articles_split.csv""", sep=";")
Then to make it back in original form make use of eval()
method and apply()
method:-
df_articles2['Class']=df_articles2['Class'].apply(lambda x:eval(x))
Finally:
resultdf=pd.json_normalize(df_articles2['Class'])
Now If you print resultdf
you will get your desired output
While the accepted answer works, using eval
is bad practice.
To parse a string column that looks like JSON/dict, use one of the following options (last one is best, if possible).
ast.literal_eval
(better)import ast
objects = df2['Class'].apply(ast.literal_eval)
normed = pd.json_normalize(objects)
df2[['Id']].join(normed)
# Id encourage contacting cardinality subClassOf get-13.5.1
# 0 4 1.0 1.0 NaN NaN NaN
# 1 5 NaN NaN 16.0 3.0 NaN
# 2 8 NaN NaN NaN NaN 1.0
# 3 9 1.0 NaN 12.0 NaN NaN
json.loads
(even better)import json
objects = df2['Class'].apply(json.loads)
normed = pd.json_normalize(objects)
df2[['Id']].join(normed)
# encourage contacting cardinality subClassOf get-13.5.1
# 0 1.0 1.0 NaN NaN NaN
# 1 NaN NaN 16.0 3.0 NaN
# 2 NaN NaN NaN NaN 1.0
# 3 1.0 NaN 12.0 NaN NaN
If the strings are single quoted, use str.replace
to convert them to double quotes (and thus valid JSON) before applying json.loads
:
objects = df2['Class'].str.replace("'", '"').apply(json.loads)
normed = pd.json_normalize(objects)
df2[['Id']].join(normed)
pd.json_normalize
before pd.to_csv
(recommended)If possible, when you originally save to CSV, just save the normalized JSON (not raw JSON objects):
df1 = df1[['Id']].join(pd.json_normalize(df1['Class']))
df1.to_csv('df1_normalized.csv', index=False, sep=';')
# Id;encourage;contacting;cardinality;subClassOf;get-13.5.1
# 4;1.0;1.0;;;
# 5;;;16.0;3.0;
# 8;;;;;1.0
# 9;1.0;;12.0;;
This is a more natural CSV workflow (rather than storing/loading object blobs):
df2 = pd.read_csv('df1_normalized.csv', sep=';')
# Id encourage contacting cardinality subClassOf get-13.5.1
# 0 4 1.0 1.0 NaN NaN NaN
# 1 5 NaN NaN 16.0 3.0 NaN
# 2 8 NaN NaN NaN NaN 1.0
# 3 9 1.0 NaN 12.0 NaN NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With