I've been working with pandas DataFrame objects in my server, converting them to CSV for transmission to the browser, where the tabular values are plotted using d3. While CSV is file as far as it goes, I really need more than just a 2D table of data. If nothing else, I'd like to return some metadata about the data.
So I started messing around with JSON thinking I would be able to construct a dictionary with some meta information and my DataFrame. For example, just as an absurdly simple example:
>>> z = numpy.zeros(10)
>>> df = pandas.DataFrame(z)
>>> df
0
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
>>> result = {
... "name": "Simple Example",
... "data": df,
... }
Not surprisingly, that can't be directly serialized using the json module. I found the jsonext module and tried it. It "works", but produces incomplete results:
>>> jsonext.dumps(result)
'{"data": ["0"], "name": "Simple Example"}'
Looking at the methods DataFrame itself provides for this sort of thing, I found to_dict() and to_json(). The former produces dictionaries of dictionaries:
>>> df.to_dict()
{0: {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0}}
but as you can see, they can't be serialized to JSON, since the keys are not strings.
df.to_json() looked like it might work, though I would then wind up with a JSON string embedded in aanother JSON string. Something like this:
json.dumps({"name": "Simple Example", "data": df.to_json()}) '{"data": "{\"0\":{\"0\":0.0,\"1\":0.0,\"2\":0.0,\"3\":0.0,\"4\":0.0,\"5\":0.0,\"6\":0.0,\"7\":0.0,\"8\":0.0,\"9\":0.0}}", "name": "Simple Example"}'
In other words, a bit of a mess.
Any suggestions about how to handle this sort of nested structure where some of the elements can't be directly serialized? I think I might be able to get jsonext to work, but its Dict mixin expects to find a proper (in its mind) to_dict() method. DataFrame.to_dict() doesn't seem to return the right thing. (Though I will continue to horse around with it.)
I figured this must be a cat which has already been skinned. I just haven't found it. I'd be happy for now with nothing more hierarchical than something like my example (though with more key/value pairs), though I won't turn my nose up at a more general solution.
The default
function (supplied to json.dumps
) gets called for all objects that can't be serialized by default. It can return any object that the default encoder can serialize, such as a dict.
df.to_json()
returns a string. json.loads(df.to_json)
returns a dict with
keys which are strings. So if we set default=lambda df: json.loads(df.to_json())
then the DataFrame will get serialized as though it were a dict.
import json
import numpy as np
import pandas as pd
z = np.zeros(10)
df = pd.DataFrame(z)
result = {"name": "Simple Example",
"data": df, }
jstr = json.dumps(result,
default=lambda df: json.loads(df.to_json()))
newresult = json.loads(jstr)
print(newresult)
# {u'data': {u'0': {u'0': 0.0,
# u'1': 0.0,
# u'2': 0.0,
# u'3': 0.0,
# u'4': 0.0,
# u'5': 0.0,
# u'6': 0.0,
# u'7': 0.0,
# u'8': 0.0,
# u'9': 0.0}},
# u'name': u'Simple Example'}
print(pd.DataFrame(newresult['data']))
yields
0
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With