Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read Json with NaN into Python and Pandas

I understand that NaN is not allowed in JSON files. I usually use

import pandas as pd 
pd.read_json('file.json') 

to read in JSON into python. Looking through the documentation, I do not see an option to handle that value.

I have a JSON file, data.json, that looks like

[{"city": "Los Angeles","job":"chef","age":30},
 {"city": "New York","job":"driver","age":35},
 {"city": "San Jose","job":"pilot","age":NaN}]

How can I read this into python/pandas and handle the NaN values?

EDIT:

Amazing answer below!! Thanks fixxxer!! Just so it's documented, reading it in from a separate file

import pandas as pd
import json

text=open('data.json','r')
x=text.read()

y=json.loads(x)
data=pd.DataFrame(y)
data.head()
like image 504
Max Avatar asked Apr 26 '15 08:04

Max


People also ask

Can Python Pandas read JSON?

Reading JSON Files using PandasTo read the files, we use read_json() function and through it, we pass the path to the JSON file we want to read. Once we do that, it returns a “DataFrame”( A table of rows and columns) that stores data.

Does JSON accept NaN?

NaN, Infinity and -Infinity are not part of JSON, but they are standard in Javascript, so they're commonly used extensions.

How do I read JSON lines in Pandas?

If you have a JSON in a string, you can read or load this into pandas DataFrame using read_json() function. By default, JSON string should be in Dict like format {column -> {index -> value}} . This is also called column orientation. Note that orient param is used to specify the JSON string format.


1 Answers

Read the json file into a variable:

x = '''[{"city": "Los Angeles","job":"chef","age":30},  {"city": "New York","job":"driver","age":35},  {"city": "San Jose","job":"pilot","age":NaN}]'''

Now, load it with json.loads

In [41]: import json

In [42]: y = json.loads(x)

In [43]: y
Out[43]: 
[{u'age': 30, u'city': u'Los Angeles', u'job': u'chef'},
 {u'age': 35, u'city': u'New York', u'job': u'driver'},
 {u'age': nan, u'city': u'San Jose', u'job': u'pilot'}]

And,

    In [44]: pd.DataFrame(y)
Out[44]: 
   age         city     job
0   30  Los Angeles    chef
1   35     New York  driver
2  NaN     San Jose   pilot
like image 98
fixxxer Avatar answered Sep 23 '22 19:09

fixxxer