Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas read json not working on MultiIndex

I'm trying to read in a dataframe created via df.to_json() via pd.read_json but I'm getting a ValueError. I think it may have to do with the fact that the index is a MultiIndex but I'm not sure how to deal with that.

The original dataframe of 55k rows is called psi and I created test.json via:

psi.head().to_json('test.json')

Hereis the output of print psi.head().to_string() if you want to use that.

When I do it on this small set of data (5 rows), I get a ValueError.

! wget --no-check-certificate https://gist.githubusercontent.com/olgabot/9897953/raw/c270d8cf1b736676783cc1372b4f8106810a14c5/test.json
import pandas as pd
pd.read_json('test.json')

Here's the full stack:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-1de2f0e65268> in <module>()
      1 get_ipython().system(u' wget https://gist.githubusercontent.com/olgabot/9897953/raw/c270d8cf1b736676783cc1372b4f8106810a14c5/test.json'>)
      2 import pandas as pd
----> 3 pd.read_json('test.json')

/home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.pyc in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit)
    196         obj = FrameParser(json, orient, dtype, convert_axes, convert_dates,
    197                           keep_default_dates, numpy, precise_float,
--> 198                           date_unit).parse()
    199 
    200     if typ == 'series' or obj is None:

/home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.pyc in parse(self)
    264 
    265         else:
--> 266             self._parse_no_numpy()
    267 
    268         if self.obj is None:

/home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.pyc in _parse_no_numpy(self)
    481         if orient == "columns":
    482             self.obj = DataFrame(
--> 483                 loads(json, precise_float=self.precise_float), dtype=None)
    484         elif orient == "split":
    485             decoded = dict((str(k), v)

ValueError: No ':' found when decoding object value

> /home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.py(483)_parse_no_numpy()
    482             self.obj = DataFrame(
--> 483                 loads(json, precise_float=self.precise_float), dtype=None)
    484         elif orient == "split":

But when I do it on the whole dataframe (55k rows) then I get an invalid pointer error and the IPython kernel dies. Any ideas?

EDIT: added how the json was generated in the first place.

like image 261
Olga Botvinnik Avatar asked Mar 31 '14 17:03

Olga Botvinnik


People also ask

Can I read JSON with pandas?

Reading JSON Files using PandasTo read the files, we use read_json() function and through it, we pass the path to the JSON file we want to read. Once we do that, it returns a “DataFrame”( A table of rows and columns) that stores data.

What does the pandas function MultiIndex From_tuples do?

from_tuples() function is used to convert list of tuples to MultiIndex. It is one of the several ways in which we construct a MultiIndex.

What is a MultiIndex DataFrame?

A MultiIndex (also known as a hierarchical index) DataFrame allows you to have multiple columns acting as a row identifier and multiple rows acting as a header identifier. With MultiIndex, you can do some sophisticated data analysis, especially for working with higher dimensional data.


1 Answers

This is not implemented ATM, see the issue here: https://github.com/pydata/pandas/issues/4889.

You can simply reset the index first, e.g

df.reset_index().to_json(...)

and it will work.

like image 135
Jeff Avatar answered Oct 27 '22 15:10

Jeff