Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Adding fields and labels to nested json file

I have a dataframe as follows:

Name_ID | URL                    | Count | Rating
------------------------------------------------
ABC     | www.example.com/ABC    | 10    | 5
123     | www.example.com/123    | 9     | 4
XYZ     | www.example.com/XYZ    | 5     | 2
ABC111  | www.example.com/ABC111 | 5     | 2
ABC121  | www.example.com/ABC121 | 5     | 2
222     | www.example.com/222    | 5     | 3
abc222  | www.example.com/abc222 | 4     | 2
ABCaaa  | www.example.com/ABCaaa | 4     | 2

I am trying to create a JSON as follows:

{
    "name": "sampledata",
    "children": [
        {
            "name": 9,
            "children": [
                {
                    "name": 4,
                    "children": [
                        {
                            "name": "123",
                            "size": 100
                        }
                    ]
                }
            ]
        },
        {
            "name": 10,
            "children": [
                {
                    "name": 5,
                    "children": [
                        {
                            "name": "ABC",
                            "size": 100
                        }
                    ]
                }
            ]
        },
        {
            "name": 4,
            "children": [
                {
                    "name": 2,
                    "children": [
                        {
                            "name": "abc222",
                            "size": 50
                        },
                        {
                            "name": "ABCaaa",
                            "size": 50
                        }
                    ]
                }
            ]
        },
        {
            "name": 5,
            "children": [
                {
                    "name": 2,
                    "children": [
                        {
                            "name": "ABC",
                            "size": 16
                        },
                        {
                            "name": "ABC111",
                            "size": 16
                        },
                        {
                            "name": "ABC121",
                            "size": 16
                        }
                    ]
                },
                {
                    "name": 3,
                    "children": [
                        {
                            "name": "222",
                            "size": 50
                        }
                    ]
                }
            ]
        }
    ]
}

In order to do that:

  • I am trying to add labels such as "name" and "children" to the json while creating it.

I tried something like

results = [{"name": i, "children": j} for i,j in results.items()]

But it won't label it properly I believe.

  • Also, add another field with the label `"size"which I am planning to calculate based on the formula:

    (Rating*Count*10000)/number_of_children_to_the_immediate_parent
    

Here is my dirty code:

import pandas as pd
from collections import defaultdict
import json

data =[('ABC', 'www.example.com/ABC', 10   , 5), ('123', 'www.example.com/123', 9, 4), ('XYZ', 'www.example.com/XYZ', 5, 2), ('ABC111', 'www.example.com/ABC111', 5, 2), ('ABC121', 'www.example.com/ABC121', 5, 2), ('222', 'www.example.com/222', 5, 3), ('abc222', 'www.example.com/abc222', 4, 2), ('ABCaaa', 'www.example.com/ABCaaa', 4, 2)]

df = pd.DataFrame(data, columns=['Name', 'URL', 'Count', 'Rating'])

gp = df.groupby(['Count'])

dict_json = {"name": "flare"}
children = []

for name, group in gp:
    temp = {}
    temp["name"] = name
    temp["children"] = []

    rgp = group.groupby(['Rating'])
    for n, g in rgp:
        temp2 = {}
        temp2["name"] = n
        temp2["children"] = g.reset_index().T.to_dict().values()
        for t in temp2["children"]:
            t["size"] = (t["Rating"] * t["Count"] * 10000) / len(temp2["children"])
            t["name"] = t["Name"]
            del t["Count"]
            del t["Rating"]
            del t["URL"]
            del t["Name"]
            del t["index"]
        temp["children"].append(temp2)
    children.append(temp)

dict_json["children"] = children

print json.dumps(dict_json, indent=4)

Though the above code does print what I need, I am looking for more efficient and cleaner way to do the same, mainly because the actual dataset might be even more nested and complicated. Any help/suggestion will be much appreciated.

like image 913
kingmakerking Avatar asked Dec 18 '16 15:12

kingmakerking


People also ask

Can JSON have nested objects?

Objects can be nested inside other objects. Each nested object must have a unique access path. The same field name can occur in nested objects in the same document.

How do you input a JSON in Python?

It's pretty easy to load a JSON object in Python. Python has a built-in package called json, which can be used to work with JSON data. It's done by using the JSON module, which provides us with a lot of methods which among loads() and load() methods are gonna help us to read the JSON file.

What is JSON nested?

Nested JSON is simply a JSON file with a fairly big portion of its values being other JSON objects. Compared with Simple JSON, Nested JSON provides higher clarity in that it decouples objects into different layers, making it easier to maintain.


3 Answers

Quite an interesting problem and a great question!

You can improve your approach by reorganizing the code inside the loops and using list comprehensions. No need to delete things and introduce temp variables inside loops:

dict_json = {"name": "flare"}

children = []
for name, group in gp:
    temp = {"name": name, "children": []}

    rgp = group.groupby(['Rating'])
    for n, g in rgp:
        temp["children"].append({
            "name": n,
            "children": [
                {"name": row["Name"],
                 "size": row["Rating"] * row["Count"] * 10000 / len(g)}
                for _, row in g.iterrows()
            ]
        })

    children.append(temp)

dict_json["children"] = children

Or, a "wrapped" version:

dict_json = {
    "name": "flare", 
    "children": [
        {
            "name": name, 
            "children": [
                {
                    "name": n,
                    "children": [
                        {
                            "name": row["Name"],
                            "size": row["Rating"] * row["Count"] * 10000 / len(g)
                        } for _, row in g.iterrows()
                    ]
                } for n, g in group.groupby(['Rating'])
            ]
        } for name, group in gp
    ]
}

I'm getting the following dictionary printed for you sample input dataframe:

{
    "name": "flare", 
    "children": [
        {
            "name": 4, 
            "children": [
                {
                    "name": 2, 
                    "children": [
                        {
                            "name": "abc222", 
                            "size": 40000
                        }, 
                        {
                            "name": "ABCaaa", 
                            "size": 40000
                        }
                    ]
                }
            ]
        }, 
        {
            "name": 5, 
            "children": [
                {
                    "name": 2, 
                    "children": [
                        {
                            "name": "XYZ", 
                            "size": 33333
                        }, 
                        {
                            "name": "ABC111", 
                            "size": 33333
                        }, 
                        {
                            "name": "ABC121", 
                            "size": 33333
                        }
                    ]
                }, 
                {
                    "name": 3, 
                    "children": [
                        {
                            "name": "222", 
                            "size": 150000
                        }
                    ]
                }
            ]
        }, 
        {
            "name": 9, 
            "children": [
                {
                    "name": 4, 
                    "children": [
                        {
                            "name": "123", 
                            "size": 360000
                        }
                    ]
                }
            ]
        }, 
        {
            "name": 10, 
            "children": [
                {
                    "name": 5, 
                    "children": [
                        {
                            "name": "ABC", 
                            "size": 500000
                        }
                    ]
                }
            ]
        }
    ]
}
like image 196
alecxe Avatar answered Oct 25 '22 18:10

alecxe


setup

from io import StringIO
import pandas as pd

txt = """Name_ID,URL,Count,Rating
ABC,www.example.com/ABC,10,5
123,www.example.com/123,9,4
XYZ,www.example.com/XYZ,5,2
ABC111,www.example.com/ABC111,5,2
ABC121,www.example.com/ABC121,5,2
222,www.example.com/222,5,3
abc222,www.example.com/abc222,4,2
ABCaaa,www.example.com/ABCaaa,4,2"""

df = pd.read_csv(StringIO(txt))

size
pre-calculate it

df['size'] = df.Count.mul(df.Rating) \
                     .mul(10000) \
                     .div(df.groupby(
                        ['Count', 'Rating']).Name_ID.transform('count')
                     ).astype(int)

solution
create recursive function

def h(d):
    if isinstance(d, pd.Series): d = d.to_frame().T
    rec_cond = d.index.nlevels > 1 or d.index.nunique() > 1
    return {'name': str(d.index[0]), 'size': str(d['size'].iloc[0])} if not rec_cond else \
        [dict(name=str(n), children=h(g.xs(n))) for n, g in d.groupby(level=0)]

demo

import json

my_dict = dict(name='flare', children=h(df.set_index(['Count', 'Rating', 'Name_ID'])))

json.dumps(my_dict)

'{"name": "flare", "children": [{"name": "4", "children": [{"name": "2", "children": [{"name": "ABCaaa", "children": {"name": "ABCaaa", "size": "40000"}}, {"name": "abc222", "children": {"name": "abc222", "size": "40000"}}]}]}, {"name": "5", "children": [{"name": "2", "children": [{"name": "ABC111", "children": {"name": "ABC111", "size": "33333"}}, {"name": "ABC121", "children": {"name": "ABC121", "size": "33333"}}, {"name": "XYZ", "children": {"name": "XYZ", "size": "33333"}}]}, {"name": "3", "children": {"name": "222", "size": "150000"}}]}, {"name": "9", "children": [{"name": "4", "children": {"name": "123", "size": "360000"}}]}, {"name": "10", "children": [{"name": "5", "children": {"name": "ABC", "size": "500000"}}]}]}'

my_dict

{'children': [{'children': [{'children': [{'children': {'name': 'ABCaaa',
        'size': '40000'},
       'name': 'ABCaaa'},
      {'children': {'name': 'abc222', 'size': '40000'}, 'name': 'abc222'}],
     'name': '2'}],
   'name': '4'},
  {'children': [{'children': [{'children': {'name': 'ABC111', 'size': '33333'},
       'name': 'ABC111'},
      {'children': {'name': 'ABC121', 'size': '33333'}, 'name': 'ABC121'},
      {'children': {'name': 'XYZ', 'size': '33333'}, 'name': 'XYZ'}],
     'name': '2'},
    {'children': {'name': '222', 'size': '150000'}, 'name': '3'}],
   'name': '5'},
  {'children': [{'children': {'name': '123', 'size': '360000'}, 'name': '4'}],
   'name': '9'},
  {'children': [{'children': {'name': 'ABC', 'size': '500000'}, 'name': '5'}],
   'name': '10'}],
 'name': 'flare'}
like image 27
piRSquared Avatar answered Oct 25 '22 18:10

piRSquared


If I understand correctly what you wan't to do is put a groupby into a nested json, if that is the case then you could use pandas groupby and cast it into a nested list of lists as so:

lol = pd.DataFrame(df.groupby(['Count','Rating'])\
               .apply(lambda x: list(x['Name_ID']))).reset_index().values.tolist()

lol should look something like this:

[['10', '5', ['ABC']],
['4', '2', ['abc222', 'ABCaaa']],
['5', '2', ['XYZ ', 'ABC111', 'ABC121']],
['5', '3', ['222 ']],
['9', '4', ['123 ']]]

after that you could loop over lol to put it into a dict, but since you want to set nested items you'l have to use autovivification (check it out):

class autovividict(dict):
   def __missing__(self, key):
      value = self[key] = type(self)()
      return value

d = autovividict()
for l in lol:
    d[l[0]][l[1]] = l[2]

now you can use the json pack for printing and exporting:

print json.dumps(d,indent=2)

In case you need more than one groupby, you could concat your groups with pandas, cast to lol, remove any nans, and then loop, let me know if a full example can help.

like image 39
Ezer K Avatar answered Oct 25 '22 19:10

Ezer K