I understand that NaN is not allowed in JSON files. I usually use
import pandas as pd
pd.read_json('file.json')
to read in JSON into python. Looking through the documentation, I do not see an option to handle that value.
I have a JSON file, data.json, that looks like
[{"city": "Los Angeles","job":"chef","age":30},
{"city": "New York","job":"driver","age":35},
{"city": "San Jose","job":"pilot","age":NaN}]
How can I read this into python/pandas and handle the NaN values?
EDIT:
Amazing answer below!! Thanks fixxxer!! Just so it's documented, reading it in from a separate file
import pandas as pd
import json
text=open('data.json','r')
x=text.read()
y=json.loads(x)
data=pd.DataFrame(y)
data.head()
Reading JSON Files using PandasTo read the files, we use read_json() function and through it, we pass the path to the JSON file we want to read. Once we do that, it returns a “DataFrame”( A table of rows and columns) that stores data.
NaN, Infinity and -Infinity are not part of JSON, but they are standard in Javascript, so they're commonly used extensions.
If you have a JSON in a string, you can read or load this into pandas DataFrame using read_json() function. By default, JSON string should be in Dict like format {column -> {index -> value}} . This is also called column orientation. Note that orient param is used to specify the JSON string format.
Read the json file into a variable:
x = '''[{"city": "Los Angeles","job":"chef","age":30}, {"city": "New York","job":"driver","age":35}, {"city": "San Jose","job":"pilot","age":NaN}]'''
Now, load it with json.loads
In [41]: import json
In [42]: y = json.loads(x)
In [43]: y
Out[43]:
[{u'age': 30, u'city': u'Los Angeles', u'job': u'chef'},
{u'age': 35, u'city': u'New York', u'job': u'driver'},
{u'age': nan, u'city': u'San Jose', u'job': u'pilot'}]
And,
In [44]: pd.DataFrame(y)
Out[44]:
age city job
0 30 Los Angeles chef
1 35 New York driver
2 NaN San Jose pilot
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With