How to build a JSON file with nested records from a flat data table?

Tags:

I'm looking for a Python technique to build a nested JSON file from a flat table in a pandas data frame. For example how could a pandas data frame table such as:

teamname  member firstname lastname  orgname         phone        mobile
0        1       0      John      Doe     Anon  916-555-1234                 
1        1       1      Jane      Doe     Anon  916-555-4321  916-555-7890   
2        2       0    Mickey    Moose  Moosers  916-555-0000  916-555-1111   
3        2       1     Minny    Moose  Moosers  916-555-2222

be taken and exported to a JSON that looks like:

{
"teams": [
{
"teamname": "1",
"members": [
  {
    "firstname": "John", 
    "lastname": "Doe",
    "orgname": "Anon",
    "phone": "916-555-1234",
    "mobile": "",
  },
  {
    "firstname": "Jane",
    "lastname": "Doe",
    "orgname": "Anon",
    "phone": "916-555-4321",
    "mobile": "916-555-7890",
  }
]
},
{
"teamname": "2",
"members": [
  {
    "firstname": "Mickey",
    "lastname": "Moose",
    "orgname": "Moosers",
    "phone": "916-555-0000",
    "mobile": "916-555-1111",
  },
  {
    "firstname": "Minny",
    "lastname": "Moose",
    "orgname": "Moosers",
    "phone": "916-555-2222",
    "mobile": "",
  }
]
}       
]

}

I have tried doing this by creating a dict of dicts and dumping to JSON. This is my current code:

data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')
memberDictTuple = [] 

for index, row in data.iterrows():
    dataRow = row
    rowDict = dict(zip(columnList[2:], dataRow[2:]))

    teamRowDict = {columnList[0]:int(dataRow[0])}

    memberId = tuple(row[1:2])
    memberId = memberId[0]

    teamName = tuple(row[0:1])
    teamName = teamName[0]

    memberDict1 = {int(memberId):rowDict}
    memberDict2 = {int(teamName):memberDict1}

    memberDictTuple.append(memberDict2)

memberDictTuple = tuple(memberDictTuple)
formattedJson = json.dumps(memberDictTuple, indent = 4, sort_keys = True)
print formattedJson

This produces the following output. Each item is nested at the correct level under "teamname" 1 or 2, but records should be nested together if they have the same teamname. How can I fix this so that teamname 1 and teamname 2 each have 2 records nested within?

[
    {
        "1": {
            "0": {
                "email": "[email protected]", 
                "firstname": "John", 
                "lastname": "Doe", 
                "mobile": "none", 
                "orgname": "Anon", 
                "phone": "916-555-1234"
            }
        }
    }, 
    {
        "1": {
            "1": {
                "email": "[email protected]", 
                "firstname": "Jane", 
                "lastname": "Doe", 
                "mobile": "916-555-7890", 
                "orgname": "Anon", 
                "phone": "916-555-4321"
            }
        }
    }, 
    {
        "2": {
            "0": {
                "email": "[email protected]", 
                "firstname": "Mickey", 
                "lastname": "Moose", 
                "mobile": "916-555-1111", 
                "orgname": "Moosers", 
                "phone": "916-555-0000"
            }
        }
    }, 
    {
        "2": {
            "1": {
                "email": "[email protected]", 
                "firstname": "Minny", 
                "lastname": "Moose", 
                "mobile": "none", 
                "orgname": "Moosers", 
                "phone": "916-555-2222"
            }
        }
    }
]

657

asked Jun 08 '16 21:06

spaine

1 Answers

This is the a solution that works and creates the desired JSON format. First, I grouped my dataframe by the appropriate columns, then instead of creating a dictionary (and losing data order) for each column heading/record pair, I created them as lists of tuples, then transformed the list into an Ordered Dict. Another Ordered Dict was created for the two columns that everything else was grouped by. Precise layering between lists and ordered dicts was necessary to for the JSON conversion to produce the correct format. Also note that when dumping to JSON, sort_keys must be set to false, or all your Ordered Dicts will be rearranged into alphabetical order.

import pandas
import json
from collections import OrderedDict

inputExcel = 'E:\\teams.xlsx'
exportJson = 'E:\\teams.json'

data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')

# This creates a tuple of column headings for later use matching them with column data
cols = []
columnList = list(data[0:])
for col in columnList:
    cols.append(str(col))
columnList = tuple(cols)

#This groups the dataframe by the 'teamname' and 'members' columns
grouped = data.groupby(['teamname', 'members']).first()

#This creates a reference to the index level of the groups
groupnames = data.groupby(["teamname", "members"]).grouper.levels
tm = (groupnames[0])

#Create a list to add team records to at the end of the first 'for' loop
teamsList = []

for teamN in tm:
    teamN = int(teamN)  #added this in to prevent TypeError: 1 is not JSON serializable
    tempList = []   #Create an temporary list to add each record to
    for index, row in grouped.iterrows():
        dataRow = row
        if index[0] == teamN:  #Select the record in each row of the grouped dataframe if its index matches the team number

            #In order to have the JSON records come out in the same order, I had to first create a list of tuples, then convert to and Ordered Dict
            rowDict = ([(columnList[2], dataRow[0]), (columnList[3], dataRow[1]), (columnList[4], dataRow[2]), (columnList[5], dataRow[3]), (columnList[6], dataRow[4]), (columnList[7], dataRow[5])])
            rowDict = OrderedDict(rowDict)
            tempList.append(rowDict)
    #Create another Ordered Dict to keep 'teamname' and the list of members from the temporary list sorted
    t = ([('teamname', str(teamN)), ('members', tempList)])
    t= OrderedDict(t)

    #Append the Ordered Dict to the emepty list of teams created earlier
    ListX = t
    teamsList.append(ListX)


#Create a final dictionary with a single item: the list of teams
teams = {"teams":teamsList} 

#Dump to JSON format
formattedJson = json.dumps(teams, indent = 1, sort_keys = False) #sort_keys MUST be set to False, or all dictionaries will be alphebetized
formattedJson = formattedJson.replace("NaN", '"NULL"') #"NaN" is the NULL format in pandas dataframes - must be replaced with "NULL" to be a valid JSON file
print formattedJson

#Export to JSON file
parsed = open(exportJson, "w")
parsed.write(formattedJson)

print"\n\nExport to JSON Complete"

193

answered Oct 19 '22 05:10

spaine

Related questions
                            
                                Day delta for dates >292 years apart
                            
                                How do I connect to MySQL without using a password? (PyMySQL)
                            
                                Py2Exe Error: Missing run-py3.5-win-amd64.exe
                            
                                Django URL without trailing slash not working
                            
                                Why is My Minimax Not Expanding and Making Moves Correctly?
                            
                                How to unfocus (blur) Python-gi GTK+3 window on Linux
                            
                                Managing connection creation in Python?
                            
                                Python - Increase recursion limit in mac osx
                            
                                Exporting Interactive Jupyter Notebook to html
                            
                                Access a page that require Safenet USB Token from urllib2 ot httplib
                            
                                SQLAlchemy - bulk insert ignore: "Duplicate entry"
                            
                                Python Pandas: Why is numpy so much faster than Pandas for column assignment? Can I optimize further?
                            
                                Python - eyeD3 Lame tag CRC check failed
                            
                                Custom pip install commands not running
                            
                                Pycharm shows "Unresolved reference" all of a sudden
                            
                                How to plot a ROC curve with Tensorflow and scikit-learn?
                            
                                Fixing Python 3 tkinter issues on Mac with a virtualenv
                            
                                Python: A list that doesn't get relocated
                            
                                Getting started with Syntaxnet
                            
                                Using a Tensorflow input pipeline with skflow/tf learn

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to build a JSON file with nested records from a flat data table?

Tags:

python

json

pandas

nested

spaine

People also ask

1 Answers

spaine

Recent Activity

Donate For Us