Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Eliminate nesting by creating new objects from json

I have a standard nested json file which looks like the below: They are multi level nested and I have to eliminate all the nesting by creating new objects.

Nested json file.

{
"persons": [{
    "id": "f4d322fa8f552",
    "address": {
        "building": "710",
        "coord": "[123, 465]",
        "street": "Avenue Road",
        "zipcode": "12345"
    },
    "cuisine": "Chinese",
    "grades": [{
        "date": "2013-03-03T00:00:00.000Z",
        "grade": "B",
        "score": {
          "x": 3,
          "y": 2
        }
    }, {
        "date": "2012-11-23T00:00:00.000Z",
        "grade": "C",
        "score": {
          "x": 1,
          "y": 22
        }
    }],
    "name": "Shash"
}]
}

The new objects that needs to be created

persons 
[
{
"id": "f4d322fa8f552",
"cuisine": "Chinese",
"name": "Shash"
}
]

persons_address
[
{
"id": "f4d322fa8f552",
"building": "710",
"coord": "[123, 465]",
"street": "Avenue Road",
"zipcode": "12345"
}
]

persons_grade
[
{
"id": "f4d322fa8f552",
"__index": "0",
"date": "2013-03-03T00:00:00.000Z",
"grade": "B"
},
{
"id": "f4d322fa8f552",
"__index": "1",
"date": "2012-11-23T00:00:00.000Z",
"grade": "C"
},
]

persons_grade_score
[
{

"id": "f4d322fa8f552",
"__index": "0",
"x": "3",
"y": "2"

},
{

"id": "f4d322fa8f552",
"__index": "1",
"x": "1",
"y": "22"

},
]

My approach : I used a normalise function to make all the lists into dicts. Added another function which can add id to all the nested dicts.

Now I am not able to traverse each level and create new objects. Is there any way to get to this.

The whole idea after new objects are created we can load it into a database.

like image 647
Shash Avatar asked Jul 16 '18 19:07

Shash


People also ask

Can you nest objects in JSON?

Objects can be nested inside other objects. Each nested object must have a unique access path. The same field name can occur in nested objects in the same document.

What is nesting JSON?

Nested JSON is simply a JSON file with a fairly big portion of its values being other JSON objects. Compared with Simple JSON, Nested JSON provides higher clarity in that it decouples objects into different layers, making it easier to maintain.

How do I remove an object from a JSON file?

To delete a JSON object from a list: Parse the JSON object into a Python list of dictionaries. Use the enumerate() function to iterate over the iterate over the list. Check if each dictionary is the one you want to remove and use the pop() method to remove the matching dict.

Does JSON Stringify nested objects?

stringify does not stringify nested arrays. Bookmark this question. Show activity on this post.


3 Answers

Concepts

Here is a generic solution that does what you need. The concept it uses is recursively looping through all values of the top-level "persons" dictionary. Based on the type of each value it finds, it proceeds.

So for all the non-dict/non-lists it finds in each dictionary, it puts those into the top-level object you need.

Or if it finds a dictionary or a list, it recursively does the same thing again, finding more non-dict/non-lists or lists or dictionaries.

Also using collections.defaultdict lets us easily populate an unknown number of lists for each key, into a dictionary, so that we can get those 4 top-level objects you want.

Code example

from collections import defaultdict

class DictFlattener(object):
def __init__(self, object_id_key, object_name):
    """Constructor.

    :param object_id_key: String key that identifies each base object
    :param object_name: String name given to the base object in data.

    """
    self._object_id_key = object_id_key
    self._object_name = object_name

    # Store each of the top-level results lists.
    self._collected_results = None

def parse(self, data):
    """Parse the given nested dictionary data into separate lists.

    Each nested dictionary is transformed into its own list of objects,
    associated with the original object via the object id.

    :param data: Dictionary of data to parse.

    :returns: Single dictionary containing the resulting lists of
        objects, where each key is the object name combined with the
        list name via an underscore.

    """

    self._collected_results = defaultdict(list)

    for value_to_parse in data[self._object_name]:
        object_id = value_to_parse[self._object_id_key]
        parsed_object = {}

        for key, value in value_to_parse.items():
            sub_object_name = self._object_name + "_" + key
            parsed_value = self._parse_value(
                value,
                object_id,
                sub_object_name,
            )
            if parsed_value:
                parsed_object[key] = parsed_value

        self._collected_results[self._object_name].append(parsed_object)

    return self._collected_results

def _parse_value(self, value_to_parse, object_id, current_object_name, index=None):
    """Parse some value of an unknown type.

    If it's a list or a dict, keep parsing, otherwise return it as-is.

    :param value_to_parse: Value to parse
    :param object_id: String id of the current top object being parsed.
    :param current_object_name: Name of the current level being parsed.

    :returns: None if value_to_parse is a dict or a list, otherwise returns
        value_to_parse.

    """
    if isinstance(value_to_parse, dict):
        self._parse_dict(
            value_to_parse,
            object_id,
            current_object_name,
            index=index,
        )
    elif isinstance(value_to_parse, list):
        self._parse_list(
            value_to_parse,
            object_id,
            current_object_name,
        )
    else:
        return value_to_parse

def _parse_dict(self, dict_to_parse, object_id, current_object_name,
                index=None):
    """Parse some value of a dict type and store it in self._collected_results.

    :param dict_to_parse: Dict to parse
    :param object_id: String id of the current top object being parsed.
    :param current_object_name: Name of the current level being parsed.

    """
    parsed_dict = {
        self._object_id_key: object_id,
    }
    if index is not None:
        parsed_dict["__index"] = index

    for key, value in dict_to_parse.items():
        sub_object_name = current_object_name + "_" + key
        parsed_value = self._parse_value(
            value,
            object_id,
            sub_object_name,
            index=index,
        )
        if parsed_value:
            parsed_dict[key] = value

    self._collected_results[current_object_name].append(parsed_dict)

def _parse_list(self, list_to_parse, object_id, current_object_name):
    """Parse some value of a list type and store it in self._collected_results.

    :param list_to_parse: Dict to parse
    :param object_id: String id of the current top object being parsed.
    :param current_object_name: Name of the current level being parsed.

    """
    for index, sub_dict in enumerate(list_to_parse):
        self._parse_value(
            sub_dict,
            object_id,
            current_object_name,
            index=index,
        )

Then to use it:

parser = DictFlattener("id", "persons")
results = parser.parse(test_data)

Notes

  1. that there were some inconsistencies in your example data vs expected, like scores were strings vs ints. So you'll need to tweak those when you compare given to expected.
  2. There's always more refactoring one could do, or it could be made more functional rather than being a class. But hopefully looking at this helps you understand how to do it.
  3. As @jbernardo said, if you will be inserting these into a relational database they shouldn't all just have "id" as the key, it should be "person_id".
like image 124
Matthew Horst Avatar answered Oct 12 '22 09:10

Matthew Horst


Here is pseudo code to help you out after parsing the json file like this Parsing values from a JSON file?

top_level = []
for key, val in data['persons']:
    if not (isinstance(val, dict) or isinstance(val, list)):
        top_level.append(key)

all_second_level = []
for key, val in data['persons']:
    if isinstance(val, dict):
        second_level = []
        for key1, val1 in data['persons']['key']:
            second_level.append(key)
        all_second_level.append(second_level)
    elif isinstance(val, list):
        second_level = []
        for index, item in enumerate(list):
            second_level_entity = []
            for key1, val1 in item:
                if not isinstance(val1, dict):
                    second_level_entity.append(key1)
                else:
                    # append it to third level entity
            # append index to the second_level_entity
            second_level.append(second_level_entity)
        all_second_level.append(second_level)

# in the end append id to all items of entities at each level
like image 28
Ishan Srivastava Avatar answered Oct 12 '22 11:10

Ishan Srivastava


# create 4 empty lists
persons = []
persons_address = []
persons_grade = []
persons_grade_score = []


# go through all your data and put the correct information in each list
for data in yourdict['persons']:
    persons.append({
        'id': data['id'],
        'cuisine': data['cuisine'],
        'name': data['name'],
    })

    _address = data['address'].copy()
    _address['id'] = data['id']
    persons_address.append(_address)

    persons_grade.extend({
        'id': data['id'].
        '__index': n,
        'date': g['date'],
        'grade': g['grade'],
    } for n, g in enumerate(data['grades']))

    persons_grade_score.extend({
        'id': data['id'].
        '__index': n,
        'x': g['x'],
        'y': g['y']
    } for n, g in enumerate(data['grades']))
like image 38
nosklo Avatar answered Oct 12 '22 09:10

nosklo