I have a DataFrame with the following columns and no duplicates: <pre class="prettyprint"><code>['region', 'type', 'name', 'value'] </code></pre> that can be seen as a hierarchy as follows <pre class="prettyprint"><code>grouped = df.groupby(['region','type', 'name']) </code></pre> I would like to serialize this hierarchy as a JSON object. If anyone is interested, the motivation behind this is to eventually put together a visualization like this one which requires a <code>JSON</code> file. To do so, I need to convert <code>grouped</code> into the following: <pre class="prettyprint"><code>new_data['children'][i]['name'] = region new_data['children'][i]['children'][j]['name'] = type new_data['children'][i]['children'][j]'children'][k]['name'] = name new_data['children'][i]['children'][j]'children'][k]['size'] = value ... </code></pre> where <code>region</code>, <code>type</code>, <code>name</code> correspond to different levels of the hierarchy (indexed by <code>i</code>, <code>j</code> and <code>k</code>) Is there an easy way in Pandas/Python to do this?

Something along these lines might get you there. <pre class="prettyprint"><code>from collections import defaultdict tree = lambda: defaultdict(tree) # a recursive defaultdict d = tree() for _, (region, type, name, value) in df.iterrows(): d['children'][region]['name'] = region ... json.dumps(d) </code></pre> A vectorized solution would be better, and maybe something that takes advantage of the speed of groupby, but I can't think of such a solution. Also take a look at <code>df.groupby(...).groups</code>, which return a dict. See also this answer.

Pandas to D3. Serializing dataframes to JSON

Tags:

python

json

pandas

d3.js

I have a DataFrame with the following columns and no duplicates:

['region', 'type', 'name', 'value']

that can be seen as a hierarchy as follows

grouped = df.groupby(['region','type', 'name'])

I would like to serialize this hierarchy as a JSON object.

If anyone is interested, the motivation behind this is to eventually put together a visualization like this one which requires a JSON file.

To do so, I need to convert grouped into the following:

new_data['children'][i]['name'] = region
new_data['children'][i]['children'][j]['name'] = type
new_data['children'][i]['children'][j]'children'][k]['name'] = name
new_data['children'][i]['children'][j]'children'][k]['size'] = value
...

where region, type, name correspond to different levels of the hierarchy (indexed by i, j and k)

Is there an easy way in Pandas/Python to do this?

880

asked May 08 '14 01:05

Amelio Vazquez-Reina

2 Answers

Something along these lines might get you there.

from collections import defaultdict

tree = lambda: defaultdict(tree)  # a recursive defaultdict
d = tree()
for _, (region, type, name, value) in df.iterrows():
    d['children'][region]['name'] = region
    ...

json.dumps(d)

A vectorized solution would be better, and maybe something that takes advantage of the speed of groupby, but I can't think of such a solution.

Also take a look at df.groupby(...).groups, which return a dict.

Dan Allan

Here's another script to take a pandas df and output a flare.json file: https://github.com/andrewheekin/csv2flare.json

answered Sep 30 '22 14:09

Andrew Heekin

Related questions
                            
                                How to Store different language(non english) data in MongoDB Field and retrive the same data?
                            
                                How to list all Python versions installed in the system?
                            
                                IPython Notebook and Pandas autocomplete
                            
                                resize a 2D numpy array excluding NaN
                            
                                Custom Slider widget with wxPython
                            
                                How to avoid using 'self' so much [duplicate]
                            
                                Differences in the ways to running Python scripts
                            
                                python pdb lambda function global name error
                            
                                How to avoid Pylint warnings for constructor of inherited class in Python 3?
                            
                                Cannot import google cloud endpoints client library class in Android project
                            
                                postgres - cannot drop database using psycopg2
                            
                                How to convert JSON string to Avro in Python?
                            
                                Python sys.stderr flush frequency
                            
                                Fast algorithm to compute Adamic-Adar
                            
                                Multiprocessing : NULL result without error in PyObject_Call
                            
                                Why is numpy.random.binomial(1, nan) = -9223372036854775807?
                            
                                Different behaviour of hexbin and histogram2d
                            
                                Using django-dynamic-formset with CreateWithInlinesView from django-extra-views - multiple formsets
                            
                                Is there way to check feature deprecation against django version?
                            
                                Django, ajax populate form with model data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With