Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Dataframe to Nested JSON

I am trying to convert a Pandas Dataframe to a JSON object. My Dataframe contains data in the following format:

         student      date    grade         course
0     Student_1    2017-06-25  93          ENGLISH
1     Student_2    2017-06-25  83          ENGLISH
2     Student_1    2017-06-25  93          MATH
3     Student_2    2017-06-25  83          MATH
4     Student_1    2017-06-26  90          MATH
5     Student_2    2017-06-26  85          MATH
6     Student_1    2017-06-26  96          ENGLISH
7     Student_2    2017-06-26  99          ENGLISH

I want to convert it to a JSON object in the following format:

[
    {'ENGLISH': [
      {
        'date' : '2017-06-25',
        'Student_1' : 93,
        'Student_2' : 83
      },

      {
        'date' : '2017-06-26',
        'Student_1' : 96,
        'Student_2' : 89
      }]
   },

    {'MATH': [
      {
        'date' : '2017-06-25',
        'Student_1' : 93,
        'Student_2' : 83
      },

      {
        'date' : '2017-06-26',
        'Student_1' : 90,
        'Student_2' : 85
      }]
    }
]

A simple .to_json() call did not do the trick for me. Is there anyway I can create the JSON object in the required format in Pandas?

like image 303
Nishant Roy Avatar asked Jun 25 '17 20:06

Nishant Roy


People also ask

What is nested JSON?

Nested JSON is a JSON file with a big portion of its values being other JSON objects. Compared with Simple JSON, Nested JSON provides higher clarity in that it decouples objects into different layers, making it easier to maintain.

Which version of pandas has Json_normalize?

Normalize semi-structured JSON data into a flat table. New in version 0.20.

How does Python handle nested JSON?

Python has built in functions that easily imports JSON files as a Python dictionary or a Pandas dataframe. Use pd. read_json() to load simple JSONs and pd. json_normalize() to load nested JSONs.


2 Answers

Try that :

file.csv:

student,date,grade,course
0,Student_1,2017-06-25,93,ENGLISH
1,Student_2,2017-06-25,83,ENGLISH
2,Student_1,2017-06-25,93,MATH
3,Student_2,2017-06-25,83,MATH
4,Student_1,2017-06-26,90,MATH
5,Student_2,2017-06-26,85,MATH
6,Student_1,2017-06-26,96,ENGLISH
7,Student_2,2017-06-26,99,ENGLISH

Execute:

from collections import defaultdict

import json
import pandas as pd


df = pd.read_csv('file.csv')

json_doc = defaultdict(list)
for _id in df.T:
    data = df.T[_id]
    key = data.course
    for elt in json_doc[key]:
        if elt["date"] == data.date:
            elt[data.student] = data.grade
            break
    else:
        values = {'date': data.date, data.student: data.grade}
        json_doc[key].append(values)

print(json.dumps(json_doc, indent=4))

Output:

{
    "ENGLISH": [
        {
            "date": "2017-06-25",
            "Student_1": 93,
            "Student_2": 83
        },
        {
            "date": "2017-06-26",
            "Student_1": 96,
            "Student_2": 99
        }
    ],
    "MATH": [
        {
            "date": "2017-06-25",
            "Student_1": 93,
            "Student_2": 83
        },
        {
            "date": "2017-06-26",
            "Student_1": 90,
            "Student_2": 85
        }
    ]
}
like image 53
glegoux Avatar answered Oct 10 '22 09:10

glegoux


You can first define a function to convert sub-groups to json, then apply this function to each group, and then merge sub-group jsons to a single json object.

def f(x):
    return (dict({'date':x.date.iloc[0]},**{k:v for k,v in zip(x.student,x.grade)}))

(
    df.groupby(['course','date'])
      .apply(f)
      .groupby(level=0)
      .apply(lambda x: x.tolist())
      .to_dict()
)
Out[1006]: 
{'ENGLISH': [{'Student_1': 93, 'Student_2': 83, 'date': '2017-06-25'},
  {'Student_1': 96, 'Student_2': 99, 'date': '2017-06-26'}],
 'MATH': [{'Student_1': 93, 'Student_2': 83, 'date': '2017-06-25'},
  {'Student_1': 90, 'Student_2': 85, 'date': '2017-06-26'}]}
like image 25
Allen Avatar answered Oct 10 '22 10:10

Allen