Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert Pandas DataFrame to JSON as element of larger data structure

I've been working with pandas DataFrame objects in my server, converting them to CSV for transmission to the browser, where the tabular values are plotted using d3. While CSV is file as far as it goes, I really need more than just a 2D table of data. If nothing else, I'd like to return some metadata about the data.

So I started messing around with JSON thinking I would be able to construct a dictionary with some meta information and my DataFrame. For example, just as an absurdly simple example:

>>> z = numpy.zeros(10)
>>> df = pandas.DataFrame(z)
>>> df
   0
0  0
1  0
2  0
3  0
4  0
5  0
6  0
7  0
8  0
9  0
>>> result = {
...   "name": "Simple Example",
...   "data": df,
... }

Not surprisingly, that can't be directly serialized using the json module. I found the jsonext module and tried it. It "works", but produces incomplete results:

>>> jsonext.dumps(result)
'{"data": ["0"], "name": "Simple Example"}'

Looking at the methods DataFrame itself provides for this sort of thing, I found to_dict() and to_json(). The former produces dictionaries of dictionaries:

>>> df.to_dict()
{0: {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0}}

but as you can see, they can't be serialized to JSON, since the keys are not strings.

df.to_json() looked like it might work, though I would then wind up with a JSON string embedded in aanother JSON string. Something like this:

json.dumps({"name": "Simple Example", "data": df.to_json()}) '{"data": "{\"0\":{\"0\":0.0,\"1\":0.0,\"2\":0.0,\"3\":0.0,\"4\":0.0,\"5\":0.0,\"6\":0.0,\"7\":0.0,\"8\":0.0,\"9\":0.0}}", "name": "Simple Example"}'

In other words, a bit of a mess.

Any suggestions about how to handle this sort of nested structure where some of the elements can't be directly serialized? I think I might be able to get jsonext to work, but its Dict mixin expects to find a proper (in its mind) to_dict() method. DataFrame.to_dict() doesn't seem to return the right thing. (Though I will continue to horse around with it.)

I figured this must be a cat which has already been skinned. I just haven't found it. I'd be happy for now with nothing more hierarchical than something like my example (though with more key/value pairs), though I won't turn my nose up at a more general solution.

like image 814
smontanaro Avatar asked Oct 07 '14 20:10

smontanaro


1 Answers

The default function (supplied to json.dumps) gets called for all objects that can't be serialized by default. It can return any object that the default encoder can serialize, such as a dict.

df.to_json() returns a string. json.loads(df.to_json) returns a dict with keys which are strings. So if we set default=lambda df: json.loads(df.to_json()) then the DataFrame will get serialized as though it were a dict.

import json
import numpy as np
import pandas as pd

z = np.zeros(10)
df = pd.DataFrame(z)
result = {"name": "Simple Example",
          "data": df, }

jstr = json.dumps(result,
                   default=lambda df: json.loads(df.to_json()))
newresult = json.loads(jstr)
print(newresult)
# {u'data': {u'0': {u'0': 0.0,
#    u'1': 0.0,
#    u'2': 0.0,
#    u'3': 0.0,
#    u'4': 0.0,
#    u'5': 0.0,
#    u'6': 0.0,
#    u'7': 0.0,
#    u'8': 0.0,
#    u'9': 0.0}},
#  u'name': u'Simple Example'}


print(pd.DataFrame(newresult['data']))

yields

   0
0  0
1  0
2  0
3  0
4  0
5  0
6  0
7  0
8  0
9  0
like image 166
unutbu Avatar answered Sep 20 '22 13:09

unutbu