I'm looking for a Python technique to build a nested JSON file from a flat table in a pandas data frame. For example how could a pandas data frame table such as:
teamname member firstname lastname orgname phone mobile
0 1 0 John Doe Anon 916-555-1234
1 1 1 Jane Doe Anon 916-555-4321 916-555-7890
2 2 0 Mickey Moose Moosers 916-555-0000 916-555-1111
3 2 1 Minny Moose Moosers 916-555-2222
be taken and exported to a JSON that looks like:
{
"teams": [
{
"teamname": "1",
"members": [
{
"firstname": "John",
"lastname": "Doe",
"orgname": "Anon",
"phone": "916-555-1234",
"mobile": "",
},
{
"firstname": "Jane",
"lastname": "Doe",
"orgname": "Anon",
"phone": "916-555-4321",
"mobile": "916-555-7890",
}
]
},
{
"teamname": "2",
"members": [
{
"firstname": "Mickey",
"lastname": "Moose",
"orgname": "Moosers",
"phone": "916-555-0000",
"mobile": "916-555-1111",
},
{
"firstname": "Minny",
"lastname": "Moose",
"orgname": "Moosers",
"phone": "916-555-2222",
"mobile": "",
}
]
}
]
}
I have tried doing this by creating a dict of dicts and dumping to JSON. This is my current code:
data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')
memberDictTuple = []
for index, row in data.iterrows():
dataRow = row
rowDict = dict(zip(columnList[2:], dataRow[2:]))
teamRowDict = {columnList[0]:int(dataRow[0])}
memberId = tuple(row[1:2])
memberId = memberId[0]
teamName = tuple(row[0:1])
teamName = teamName[0]
memberDict1 = {int(memberId):rowDict}
memberDict2 = {int(teamName):memberDict1}
memberDictTuple.append(memberDict2)
memberDictTuple = tuple(memberDictTuple)
formattedJson = json.dumps(memberDictTuple, indent = 4, sort_keys = True)
print formattedJson
This produces the following output. Each item is nested at the correct level under "teamname" 1 or 2, but records should be nested together if they have the same teamname. How can I fix this so that teamname 1 and teamname 2 each have 2 records nested within?
[
{
"1": {
"0": {
"email": "[email protected]",
"firstname": "John",
"lastname": "Doe",
"mobile": "none",
"orgname": "Anon",
"phone": "916-555-1234"
}
}
},
{
"1": {
"1": {
"email": "[email protected]",
"firstname": "Jane",
"lastname": "Doe",
"mobile": "916-555-7890",
"orgname": "Anon",
"phone": "916-555-4321"
}
}
},
{
"2": {
"0": {
"email": "[email protected]",
"firstname": "Mickey",
"lastname": "Moose",
"mobile": "916-555-1111",
"orgname": "Moosers",
"phone": "916-555-0000"
}
}
},
{
"2": {
"1": {
"email": "[email protected]",
"firstname": "Minny",
"lastname": "Moose",
"mobile": "none",
"orgname": "Moosers",
"phone": "916-555-2222"
}
}
}
]
Nested JSON is simply a JSON file with a fairly big portion of its values being other JSON objects. Compared with Simple JSON, Nested JSON provides higher clarity in that it decouples objects into different layers, making it easier to maintain. Using Phrase, keys will be stored by separating levels with a dot.
A resource representation, in the JSON format, is a complex JSON object. Lists of items and nested data structures are represented as JSON arrays and nested JSON objects.
Steps for Appending to a JSON File In Python, appending JSON to a file consists of the following steps: Read the JSON in Python dict or list object. Append the JSON to dict (or list ) object by modifying it. Write the updated dict (or list ) object into the original file.
Pandas have a nice inbuilt function called json_normalize() to flatten the simple to moderately semi-structured nested JSON structures to flat tables. Parameters: data – dict or list of dicts.
This is the a solution that works and creates the desired JSON format. First, I grouped my dataframe by the appropriate columns, then instead of creating a dictionary (and losing data order) for each column heading/record pair, I created them as lists of tuples, then transformed the list into an Ordered Dict. Another Ordered Dict was created for the two columns that everything else was grouped by. Precise layering between lists and ordered dicts was necessary to for the JSON conversion to produce the correct format. Also note that when dumping to JSON, sort_keys must be set to false, or all your Ordered Dicts will be rearranged into alphabetical order.
import pandas
import json
from collections import OrderedDict
inputExcel = 'E:\\teams.xlsx'
exportJson = 'E:\\teams.json'
data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')
# This creates a tuple of column headings for later use matching them with column data
cols = []
columnList = list(data[0:])
for col in columnList:
cols.append(str(col))
columnList = tuple(cols)
#This groups the dataframe by the 'teamname' and 'members' columns
grouped = data.groupby(['teamname', 'members']).first()
#This creates a reference to the index level of the groups
groupnames = data.groupby(["teamname", "members"]).grouper.levels
tm = (groupnames[0])
#Create a list to add team records to at the end of the first 'for' loop
teamsList = []
for teamN in tm:
teamN = int(teamN) #added this in to prevent TypeError: 1 is not JSON serializable
tempList = [] #Create an temporary list to add each record to
for index, row in grouped.iterrows():
dataRow = row
if index[0] == teamN: #Select the record in each row of the grouped dataframe if its index matches the team number
#In order to have the JSON records come out in the same order, I had to first create a list of tuples, then convert to and Ordered Dict
rowDict = ([(columnList[2], dataRow[0]), (columnList[3], dataRow[1]), (columnList[4], dataRow[2]), (columnList[5], dataRow[3]), (columnList[6], dataRow[4]), (columnList[7], dataRow[5])])
rowDict = OrderedDict(rowDict)
tempList.append(rowDict)
#Create another Ordered Dict to keep 'teamname' and the list of members from the temporary list sorted
t = ([('teamname', str(teamN)), ('members', tempList)])
t= OrderedDict(t)
#Append the Ordered Dict to the emepty list of teams created earlier
ListX = t
teamsList.append(ListX)
#Create a final dictionary with a single item: the list of teams
teams = {"teams":teamsList}
#Dump to JSON format
formattedJson = json.dumps(teams, indent = 1, sort_keys = False) #sort_keys MUST be set to False, or all dictionaries will be alphebetized
formattedJson = formattedJson.replace("NaN", '"NULL"') #"NaN" is the NULL format in pandas dataframes - must be replaced with "NULL" to be a valid JSON file
print formattedJson
#Export to JSON file
parsed = open(exportJson, "w")
parsed.write(formattedJson)
print"\n\nExport to JSON Complete"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With