Eliminate nesting by creating new objects from json

Tags:

I have a standard nested json file which looks like the below: They are multi level nested and I have to eliminate all the nesting by creating new objects.

Nested json file.

{
"persons": [{
    "id": "f4d322fa8f552",
    "address": {
        "building": "710",
        "coord": "[123, 465]",
        "street": "Avenue Road",
        "zipcode": "12345"
    },
    "cuisine": "Chinese",
    "grades": [{
        "date": "2013-03-03T00:00:00.000Z",
        "grade": "B",
        "score": {
          "x": 3,
          "y": 2
        }
    }, {
        "date": "2012-11-23T00:00:00.000Z",
        "grade": "C",
        "score": {
          "x": 1,
          "y": 22
        }
    }],
    "name": "Shash"
}]
}

The new objects that needs to be created

persons 
[
{
"id": "f4d322fa8f552",
"cuisine": "Chinese",
"name": "Shash"
}
]

persons_address
[
{
"id": "f4d322fa8f552",
"building": "710",
"coord": "[123, 465]",
"street": "Avenue Road",
"zipcode": "12345"
}
]

persons_grade
[
{
"id": "f4d322fa8f552",
"__index": "0",
"date": "2013-03-03T00:00:00.000Z",
"grade": "B"
},
{
"id": "f4d322fa8f552",
"__index": "1",
"date": "2012-11-23T00:00:00.000Z",
"grade": "C"
},
]

persons_grade_score
[
{

"id": "f4d322fa8f552",
"__index": "0",
"x": "3",
"y": "2"

},
{

"id": "f4d322fa8f552",
"__index": "1",
"x": "1",
"y": "22"

},
]

My approach : I used a normalise function to make all the lists into dicts. Added another function which can add id to all the nested dicts.

Now I am not able to traverse each level and create new objects. Is there any way to get to this.

The whole idea after new objects are created we can load it into a database.

647

asked Jul 16 '18 19:07

Shash

3 Answers

Concepts

Here is a generic solution that does what you need. The concept it uses is recursively looping through all values of the top-level "persons" dictionary. Based on the type of each value it finds, it proceeds.

So for all the non-dict/non-lists it finds in each dictionary, it puts those into the top-level object you need.

Or if it finds a dictionary or a list, it recursively does the same thing again, finding more non-dict/non-lists or lists or dictionaries.

Also using collections.defaultdict lets us easily populate an unknown number of lists for each key, into a dictionary, so that we can get those 4 top-level objects you want.

Code example

from collections import defaultdict

class DictFlattener(object):
def __init__(self, object_id_key, object_name):
    """Constructor.

    :param object_id_key: String key that identifies each base object
    :param object_name: String name given to the base object in data.

    """
    self._object_id_key = object_id_key
    self._object_name = object_name

    # Store each of the top-level results lists.
    self._collected_results = None

def parse(self, data):
    """Parse the given nested dictionary data into separate lists.

    Each nested dictionary is transformed into its own list of objects,
    associated with the original object via the object id.

    :param data: Dictionary of data to parse.

    :returns: Single dictionary containing the resulting lists of
        objects, where each key is the object name combined with the
        list name via an underscore.

    """

    self._collected_results = defaultdict(list)

    for value_to_parse in data[self._object_name]:
        object_id = value_to_parse[self._object_id_key]
        parsed_object = {}

        for key, value in value_to_parse.items():
            sub_object_name = self._object_name + "_" + key
            parsed_value = self._parse_value(
                value,
                object_id,
                sub_object_name,
            )
            if parsed_value:
                parsed_object[key] = parsed_value

        self._collected_results[self._object_name].append(parsed_object)

    return self._collected_results

def _parse_value(self, value_to_parse, object_id, current_object_name, index=None):
    """Parse some value of an unknown type.

    If it's a list or a dict, keep parsing, otherwise return it as-is.

    :param value_to_parse: Value to parse
    :param object_id: String id of the current top object being parsed.
    :param current_object_name: Name of the current level being parsed.

    :returns: None if value_to_parse is a dict or a list, otherwise returns
        value_to_parse.

    """
    if isinstance(value_to_parse, dict):
        self._parse_dict(
            value_to_parse,
            object_id,
            current_object_name,
            index=index,
        )
    elif isinstance(value_to_parse, list):
        self._parse_list(
            value_to_parse,
            object_id,
            current_object_name,
        )
    else:
        return value_to_parse

def _parse_dict(self, dict_to_parse, object_id, current_object_name,
                index=None):
    """Parse some value of a dict type and store it in self._collected_results.

    :param dict_to_parse: Dict to parse
    :param object_id: String id of the current top object being parsed.
    :param current_object_name: Name of the current level being parsed.

    """
    parsed_dict = {
        self._object_id_key: object_id,
    }
    if index is not None:
        parsed_dict["__index"] = index

    for key, value in dict_to_parse.items():
        sub_object_name = current_object_name + "_" + key
        parsed_value = self._parse_value(
            value,
            object_id,
            sub_object_name,
            index=index,
        )
        if parsed_value:
            parsed_dict[key] = value

    self._collected_results[current_object_name].append(parsed_dict)

def _parse_list(self, list_to_parse, object_id, current_object_name):
    """Parse some value of a list type and store it in self._collected_results.

    :param list_to_parse: Dict to parse
    :param object_id: String id of the current top object being parsed.
    :param current_object_name: Name of the current level being parsed.

    """
    for index, sub_dict in enumerate(list_to_parse):
        self._parse_value(
            sub_dict,
            object_id,
            current_object_name,
            index=index,
        )

Then to use it:

parser = DictFlattener("id", "persons")
results = parser.parse(test_data)

Notes

that there were some inconsistencies in your example data vs expected, like scores were strings vs ints. So you'll need to tweak those when you compare given to expected.
There's always more refactoring one could do, or it could be made more functional rather than being a class. But hopefully looking at this helps you understand how to do it.
As @jbernardo said, if you will be inserting these into a relational database they shouldn't all just have "id" as the key, it should be "person_id".

124

answered Oct 12 '22 09:10

Matthew Horst

Here is pseudo code to help you out after parsing the json file like this Parsing values from a JSON file?

top_level = []
for key, val in data['persons']:
    if not (isinstance(val, dict) or isinstance(val, list)):
        top_level.append(key)

all_second_level = []
for key, val in data['persons']:
    if isinstance(val, dict):
        second_level = []
        for key1, val1 in data['persons']['key']:
            second_level.append(key)
        all_second_level.append(second_level)
    elif isinstance(val, list):
        second_level = []
        for index, item in enumerate(list):
            second_level_entity = []
            for key1, val1 in item:
                if not isinstance(val1, dict):
                    second_level_entity.append(key1)
                else:
                    # append it to third level entity
            # append index to the second_level_entity
            second_level.append(second_level_entity)
        all_second_level.append(second_level)

# in the end append id to all items of entities at each level

answered Oct 12 '22 11:10

Ishan Srivastava

# create 4 empty lists
persons = []
persons_address = []
persons_grade = []
persons_grade_score = []


# go through all your data and put the correct information in each list
for data in yourdict['persons']:
    persons.append({
        'id': data['id'],
        'cuisine': data['cuisine'],
        'name': data['name'],
    })

    _address = data['address'].copy()
    _address['id'] = data['id']
    persons_address.append(_address)

    persons_grade.extend({
        'id': data['id'].
        '__index': n,
        'date': g['date'],
        'grade': g['grade'],
    } for n, g in enumerate(data['grades']))

    persons_grade_score.extend({
        'id': data['id'].
        '__index': n,
        'x': g['x'],
        'y': g['y']
    } for n, g in enumerate(data['grades']))

answered Oct 12 '22 09:10

nosklo

Related questions
                            
                                Pandas: merge_asof-like solutions for merging two multi-indexed DataFrames?
                            
                                Keras LSTM Multiple Input Multiple Output
                            
                                How to use AsciiDoc with Python?
                            
                                train_test_split with multiple features
                            
                                Fill forms using selenium or requests
                            
                                Is there documentation for file object?
                            
                                How can I determine if the numbers in a list initially increase (or stay the same) and then decrease (or stay the same) with Python?
                            
                                Matplotlib scale axis lengths to be equal
                            
                                Buffer function for python 3+
                            
                                getting percentage and count Python
                            
                                Update a bokeh plot using ajax
                            
                                mypy trouble with inheritance of objects in lists
                            
                                Get Row Position instead of Row Index from iterrows() in Pandas
                            
                                Cython + OpenCV and NumPy
                            
                                Determine the rate limit for requests
                            
                                Error: from tensorflow.examples.tutorials.mnist import input_data
                            
                                How do tell setuptools to get my package from src/mypackage
                            
                                Can't generate autodoc using Sphinx in my Django project
                            
                                Python: How to create multi line cells in excel when exporting a pandas dataframe
                            
                                How do I build multiple wheel files from a single setup.py?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Eliminate nesting by creating new objects from json

Tags:

python

json

dictionary