Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pd.json_normalize() gives “str object has no attribute 'values'"

I manually create a DataFrame:

import pandas as pd
df_articles1 = pd.DataFrame({'Id'   : [4,5,8,9],
                            'Class':[
                                        {'encourage': 1, 'contacting': 1},
                                        {'cardinality': 16, 'subClassOf': 3},
                                        {'get-13.5.1': 1},
                                        {'cardinality': 12, 'encourage': 1}
                                    ]
                            }) 

I export it to a csv file to import after splitting it:

df_articles1.to_csv(f"""{path}articles_split.csv""", index = False, sep=";")

I can split it with pd.json_normalize():

df_articles1 = pd.json_normalize(df_articles1['Class'])

I import its csv file to a DataFrame:

df_articles2 = pd.read_csv(f"""{path}articles_split.csv""", sep=";") 

But this fails with:

AttributeError: 'str' object has no attribute 'values' pd.json_normalize(df_articles2['Class'])

like image 325
Theo75 Avatar asked Mar 01 '23 16:03

Theo75


2 Answers

that was because when you save by to_csv() the data in your 'Class' column is stored as string not as dictionary/json so after loading that saved data:

df_articles2 = pd.read_csv(f"""{path}articles_split.csv""", sep=";") 

Then to make it back in original form make use of eval() method and apply() method:-

df_articles2['Class']=df_articles2['Class'].apply(lambda x:eval(x))

Finally:

resultdf=pd.json_normalize(df_articles2['Class'])

Now If you print resultdf you will get your desired output

like image 78
Anurag Dabas Avatar answered Apr 07 '23 00:04

Anurag Dabas


While the accepted answer works, using eval is bad practice.

To parse a string column that looks like JSON/dict, use one of the following options (last one is best, if possible).


ast.literal_eval (better)

import ast

objects = df2['Class'].apply(ast.literal_eval)
normed = pd.json_normalize(objects)
df2[['Id']].join(normed)

#    Id  encourage  contacting  cardinality  subClassOf  get-13.5.1
# 0   4        1.0         1.0          NaN         NaN         NaN
# 1   5        NaN         NaN         16.0         3.0         NaN
# 2   8        NaN         NaN          NaN         NaN         1.0
# 3   9        1.0         NaN         12.0         NaN         NaN

json.loads (even better)

import json

objects = df2['Class'].apply(json.loads)
normed = pd.json_normalize(objects)
df2[['Id']].join(normed)

#    encourage  contacting  cardinality  subClassOf  get-13.5.1
# 0        1.0         1.0          NaN         NaN         NaN
# 1        NaN         NaN         16.0         3.0         NaN
# 2        NaN         NaN          NaN         NaN         1.0
# 3        1.0         NaN         12.0         NaN         NaN

If the strings are single quoted, use str.replace to convert them to double quotes (and thus valid JSON) before applying json.loads:

objects = df2['Class'].str.replace("'", '"').apply(json.loads)
normed = pd.json_normalize(objects)
df2[['Id']].join(normed)

pd.json_normalize before pd.to_csv (recommended)

If possible, when you originally save to CSV, just save the normalized JSON (not raw JSON objects):

df1 = df1[['Id']].join(pd.json_normalize(df1['Class']))
df1.to_csv('df1_normalized.csv', index=False, sep=';')

# Id;encourage;contacting;cardinality;subClassOf;get-13.5.1
# 4;1.0;1.0;;;
# 5;;;16.0;3.0;
# 8;;;;;1.0
# 9;1.0;;12.0;;

This is a more natural CSV workflow (rather than storing/loading object blobs):

df2 = pd.read_csv('df1_normalized.csv', sep=';')

#    Id  encourage  contacting  cardinality  subClassOf  get-13.5.1
# 0   4        1.0         1.0          NaN         NaN         NaN
# 1   5        NaN         NaN         16.0         3.0         NaN
# 2   8        NaN         NaN          NaN         NaN         1.0
# 3   9        1.0         NaN         12.0         NaN         NaN
like image 21
tdy Avatar answered Apr 07 '23 01:04

tdy