Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas to D3. Serializing dataframes to JSON

I have a DataFrame with the following columns and no duplicates:

['region', 'type', 'name', 'value']

that can be seen as a hierarchy as follows

grouped = df.groupby(['region','type', 'name'])

I would like to serialize this hierarchy as a JSON object.

If anyone is interested, the motivation behind this is to eventually put together a visualization like this one which requires a JSON file.

To do so, I need to convert grouped into the following:

new_data['children'][i]['name'] = region
new_data['children'][i]['children'][j]['name'] = type
new_data['children'][i]['children'][j]'children'][k]['name'] = name
new_data['children'][i]['children'][j]'children'][k]['size'] = value
...

where region, type, name correspond to different levels of the hierarchy (indexed by i, j and k)

Is there an easy way in Pandas/Python to do this?

like image 880
Amelio Vazquez-Reina Avatar asked May 08 '14 01:05

Amelio Vazquez-Reina


People also ask

Is Pandas good for JSON?

Pandas read_json()This API from Pandas helps to read JSON data and works great for already flattened data like we have in our Example 1. You can download the JSON from here. Just reading the JSON converted it into a flat table below.

What is Orient in JSON?

orient : Indication of expected JSON string format. date_format : None, 'epoch', 'iso'} double_precision : The number of decimal places to use when encoding floating point values. force_ascii : Force encoded string to be ASCII. date_unit : string, default 'ms' (milliseconds)

How do I get rid of Pandas indexing?

The most straightforward way to drop a Pandas dataframe index is to use the Pandas . reset_index() method. By default, the method will only reset the index, forcing values from 0 - len(df)-1 as the index.


2 Answers

Something along these lines might get you there.

from collections import defaultdict

tree = lambda: defaultdict(tree)  # a recursive defaultdict
d = tree()
for _, (region, type, name, value) in df.iterrows():
    d['children'][region]['name'] = region
    ...

json.dumps(d)

A vectorized solution would be better, and maybe something that takes advantage of the speed of groupby, but I can't think of such a solution.

Also take a look at df.groupby(...).groups, which return a dict.

See also this answer.

like image 175
Dan Allan Avatar answered Sep 30 '22 14:09

Dan Allan


Here's another script to take a pandas df and output a flare.json file: https://github.com/andrewheekin/csv2flare.json

like image 37
Andrew Heekin Avatar answered Sep 30 '22 14:09

Andrew Heekin