I have a standard nested json file which looks like the below: They are multi level nested and I have to eliminate all the nesting by creating new objects.
Nested json file.
{
"persons": [{
"id": "f4d322fa8f552",
"address": {
"building": "710",
"coord": "[123, 465]",
"street": "Avenue Road",
"zipcode": "12345"
},
"cuisine": "Chinese",
"grades": [{
"date": "2013-03-03T00:00:00.000Z",
"grade": "B",
"score": {
"x": 3,
"y": 2
}
}, {
"date": "2012-11-23T00:00:00.000Z",
"grade": "C",
"score": {
"x": 1,
"y": 22
}
}],
"name": "Shash"
}]
}
The new objects that needs to be created
persons
[
{
"id": "f4d322fa8f552",
"cuisine": "Chinese",
"name": "Shash"
}
]
persons_address
[
{
"id": "f4d322fa8f552",
"building": "710",
"coord": "[123, 465]",
"street": "Avenue Road",
"zipcode": "12345"
}
]
persons_grade
[
{
"id": "f4d322fa8f552",
"__index": "0",
"date": "2013-03-03T00:00:00.000Z",
"grade": "B"
},
{
"id": "f4d322fa8f552",
"__index": "1",
"date": "2012-11-23T00:00:00.000Z",
"grade": "C"
},
]
persons_grade_score
[
{
"id": "f4d322fa8f552",
"__index": "0",
"x": "3",
"y": "2"
},
{
"id": "f4d322fa8f552",
"__index": "1",
"x": "1",
"y": "22"
},
]
My approach : I used a normalise function to make all the lists into dicts. Added another function which can add id
to all the nested dicts.
Now I am not able to traverse each level and create new objects. Is there any way to get to this.
The whole idea after new objects are created we can load it into a database.
Objects can be nested inside other objects. Each nested object must have a unique access path. The same field name can occur in nested objects in the same document.
Nested JSON is simply a JSON file with a fairly big portion of its values being other JSON objects. Compared with Simple JSON, Nested JSON provides higher clarity in that it decouples objects into different layers, making it easier to maintain.
To delete a JSON object from a list: Parse the JSON object into a Python list of dictionaries. Use the enumerate() function to iterate over the iterate over the list. Check if each dictionary is the one you want to remove and use the pop() method to remove the matching dict.
stringify does not stringify nested arrays. Bookmark this question. Show activity on this post.
Here is a generic solution that does what you need. The concept it uses is recursively looping through all values of the top-level "persons" dictionary. Based on the type of each value it finds, it proceeds.
So for all the non-dict/non-lists it finds in each dictionary, it puts those into the top-level object you need.
Or if it finds a dictionary or a list, it recursively does the same thing again, finding more non-dict/non-lists or lists or dictionaries.
Also using collections.defaultdict lets us easily populate an unknown number of lists for each key, into a dictionary, so that we can get those 4 top-level objects you want.
from collections import defaultdict
class DictFlattener(object):
def __init__(self, object_id_key, object_name):
"""Constructor.
:param object_id_key: String key that identifies each base object
:param object_name: String name given to the base object in data.
"""
self._object_id_key = object_id_key
self._object_name = object_name
# Store each of the top-level results lists.
self._collected_results = None
def parse(self, data):
"""Parse the given nested dictionary data into separate lists.
Each nested dictionary is transformed into its own list of objects,
associated with the original object via the object id.
:param data: Dictionary of data to parse.
:returns: Single dictionary containing the resulting lists of
objects, where each key is the object name combined with the
list name via an underscore.
"""
self._collected_results = defaultdict(list)
for value_to_parse in data[self._object_name]:
object_id = value_to_parse[self._object_id_key]
parsed_object = {}
for key, value in value_to_parse.items():
sub_object_name = self._object_name + "_" + key
parsed_value = self._parse_value(
value,
object_id,
sub_object_name,
)
if parsed_value:
parsed_object[key] = parsed_value
self._collected_results[self._object_name].append(parsed_object)
return self._collected_results
def _parse_value(self, value_to_parse, object_id, current_object_name, index=None):
"""Parse some value of an unknown type.
If it's a list or a dict, keep parsing, otherwise return it as-is.
:param value_to_parse: Value to parse
:param object_id: String id of the current top object being parsed.
:param current_object_name: Name of the current level being parsed.
:returns: None if value_to_parse is a dict or a list, otherwise returns
value_to_parse.
"""
if isinstance(value_to_parse, dict):
self._parse_dict(
value_to_parse,
object_id,
current_object_name,
index=index,
)
elif isinstance(value_to_parse, list):
self._parse_list(
value_to_parse,
object_id,
current_object_name,
)
else:
return value_to_parse
def _parse_dict(self, dict_to_parse, object_id, current_object_name,
index=None):
"""Parse some value of a dict type and store it in self._collected_results.
:param dict_to_parse: Dict to parse
:param object_id: String id of the current top object being parsed.
:param current_object_name: Name of the current level being parsed.
"""
parsed_dict = {
self._object_id_key: object_id,
}
if index is not None:
parsed_dict["__index"] = index
for key, value in dict_to_parse.items():
sub_object_name = current_object_name + "_" + key
parsed_value = self._parse_value(
value,
object_id,
sub_object_name,
index=index,
)
if parsed_value:
parsed_dict[key] = value
self._collected_results[current_object_name].append(parsed_dict)
def _parse_list(self, list_to_parse, object_id, current_object_name):
"""Parse some value of a list type and store it in self._collected_results.
:param list_to_parse: Dict to parse
:param object_id: String id of the current top object being parsed.
:param current_object_name: Name of the current level being parsed.
"""
for index, sub_dict in enumerate(list_to_parse):
self._parse_value(
sub_dict,
object_id,
current_object_name,
index=index,
)
Then to use it:
parser = DictFlattener("id", "persons")
results = parser.parse(test_data)
Here is pseudo code to help you out after parsing the json
file like this Parsing values from a JSON file?
top_level = []
for key, val in data['persons']:
if not (isinstance(val, dict) or isinstance(val, list)):
top_level.append(key)
all_second_level = []
for key, val in data['persons']:
if isinstance(val, dict):
second_level = []
for key1, val1 in data['persons']['key']:
second_level.append(key)
all_second_level.append(second_level)
elif isinstance(val, list):
second_level = []
for index, item in enumerate(list):
second_level_entity = []
for key1, val1 in item:
if not isinstance(val1, dict):
second_level_entity.append(key1)
else:
# append it to third level entity
# append index to the second_level_entity
second_level.append(second_level_entity)
all_second_level.append(second_level)
# in the end append id to all items of entities at each level
# create 4 empty lists
persons = []
persons_address = []
persons_grade = []
persons_grade_score = []
# go through all your data and put the correct information in each list
for data in yourdict['persons']:
persons.append({
'id': data['id'],
'cuisine': data['cuisine'],
'name': data['name'],
})
_address = data['address'].copy()
_address['id'] = data['id']
persons_address.append(_address)
persons_grade.extend({
'id': data['id'].
'__index': n,
'date': g['date'],
'grade': g['grade'],
} for n, g in enumerate(data['grades']))
persons_grade_score.extend({
'id': data['id'].
'__index': n,
'x': g['x'],
'y': g['y']
} for n, g in enumerate(data['grades']))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With