I am trying to convert a Pandas Dataframe to a JSON object. My Dataframe contains data in the following format:
student date grade course
0 Student_1 2017-06-25 93 ENGLISH
1 Student_2 2017-06-25 83 ENGLISH
2 Student_1 2017-06-25 93 MATH
3 Student_2 2017-06-25 83 MATH
4 Student_1 2017-06-26 90 MATH
5 Student_2 2017-06-26 85 MATH
6 Student_1 2017-06-26 96 ENGLISH
7 Student_2 2017-06-26 99 ENGLISH
I want to convert it to a JSON object in the following format:
[
{'ENGLISH': [
{
'date' : '2017-06-25',
'Student_1' : 93,
'Student_2' : 83
},
{
'date' : '2017-06-26',
'Student_1' : 96,
'Student_2' : 89
}]
},
{'MATH': [
{
'date' : '2017-06-25',
'Student_1' : 93,
'Student_2' : 83
},
{
'date' : '2017-06-26',
'Student_1' : 90,
'Student_2' : 85
}]
}
]
A simple .to_json()
call did not do the trick for me. Is there anyway I can create the JSON object in the required format in Pandas?
Nested JSON is a JSON file with a big portion of its values being other JSON objects. Compared with Simple JSON, Nested JSON provides higher clarity in that it decouples objects into different layers, making it easier to maintain.
Normalize semi-structured JSON data into a flat table. New in version 0.20.
Python has built in functions that easily imports JSON files as a Python dictionary or a Pandas dataframe. Use pd. read_json() to load simple JSONs and pd. json_normalize() to load nested JSONs.
Try that :
file.csv:
student,date,grade,course
0,Student_1,2017-06-25,93,ENGLISH
1,Student_2,2017-06-25,83,ENGLISH
2,Student_1,2017-06-25,93,MATH
3,Student_2,2017-06-25,83,MATH
4,Student_1,2017-06-26,90,MATH
5,Student_2,2017-06-26,85,MATH
6,Student_1,2017-06-26,96,ENGLISH
7,Student_2,2017-06-26,99,ENGLISH
Execute:
from collections import defaultdict
import json
import pandas as pd
df = pd.read_csv('file.csv')
json_doc = defaultdict(list)
for _id in df.T:
data = df.T[_id]
key = data.course
for elt in json_doc[key]:
if elt["date"] == data.date:
elt[data.student] = data.grade
break
else:
values = {'date': data.date, data.student: data.grade}
json_doc[key].append(values)
print(json.dumps(json_doc, indent=4))
Output:
{
"ENGLISH": [
{
"date": "2017-06-25",
"Student_1": 93,
"Student_2": 83
},
{
"date": "2017-06-26",
"Student_1": 96,
"Student_2": 99
}
],
"MATH": [
{
"date": "2017-06-25",
"Student_1": 93,
"Student_2": 83
},
{
"date": "2017-06-26",
"Student_1": 90,
"Student_2": 85
}
]
}
You can first define a function to convert sub-groups to json, then apply this function to each group, and then merge sub-group jsons to a single json object.
def f(x):
return (dict({'date':x.date.iloc[0]},**{k:v for k,v in zip(x.student,x.grade)}))
(
df.groupby(['course','date'])
.apply(f)
.groupby(level=0)
.apply(lambda x: x.tolist())
.to_dict()
)
Out[1006]:
{'ENGLISH': [{'Student_1': 93, 'Student_2': 83, 'date': '2017-06-25'},
{'Student_1': 96, 'Student_2': 99, 'date': '2017-06-26'}],
'MATH': [{'Student_1': 93, 'Student_2': 83, 'date': '2017-06-25'},
{'Student_1': 90, 'Student_2': 85, 'date': '2017-06-26'}]}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With