I have a DataFrame with the following columns and no duplicates:
['region', 'type', 'name', 'value']
that can be seen as a hierarchy as follows
grouped = df.groupby(['region','type', 'name'])
I would like to serialize this hierarchy as a JSON object.
If anyone is interested, the motivation behind this is to eventually put together a visualization like this one which requires a JSON
file.
To do so, I need to convert grouped
into the following:
new_data['children'][i]['name'] = region
new_data['children'][i]['children'][j]['name'] = type
new_data['children'][i]['children'][j]'children'][k]['name'] = name
new_data['children'][i]['children'][j]'children'][k]['size'] = value
...
where region
, type
, name
correspond to different levels of the hierarchy (indexed by i
, j
and k
)
Is there an easy way in Pandas/Python to do this?
Pandas read_json()This API from Pandas helps to read JSON data and works great for already flattened data like we have in our Example 1. You can download the JSON from here. Just reading the JSON converted it into a flat table below.
orient : Indication of expected JSON string format. date_format : None, 'epoch', 'iso'} double_precision : The number of decimal places to use when encoding floating point values. force_ascii : Force encoded string to be ASCII. date_unit : string, default 'ms' (milliseconds)
The most straightforward way to drop a Pandas dataframe index is to use the Pandas . reset_index() method. By default, the method will only reset the index, forcing values from 0 - len(df)-1 as the index.
Something along these lines might get you there.
from collections import defaultdict
tree = lambda: defaultdict(tree) # a recursive defaultdict
d = tree()
for _, (region, type, name, value) in df.iterrows():
d['children'][region]['name'] = region
...
json.dumps(d)
A vectorized solution would be better, and maybe something that takes advantage of the speed of groupby, but I can't think of such a solution.
Also take a look at df.groupby(...).groups
, which return a dict.
See also this answer.
Here's another script to take a pandas df and output a flare.json file: https://github.com/andrewheekin/csv2flare.json
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With