Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all the keys and keys of the keys in a nested dictionary

I'm trying to find all the attributes of the data in a nested dictionary in Python. Some objects may have multiple levels in their key definition. How can I find the header of such a complicated nested data (if we think as a table structure). Here are very few lines of my data to see how it looks like:

{"MessageType": "SALES.HOLDCREATED", "Event": {"Id": "ZWbDoMKQw6HDjFzCo8KuwpNmwofCjl7Co8OPwpDCncOSXMOdccKTZVVWZWbCnA==", "RefInfo": {"TId": {"Id": "ZMKXwpbClsOhwpNiw5E="}, "UserId": {"Id": "wpzCksKWwpbCpMKTYsKeZMKZbA=="}, "SentUtc": "2013-04-28T16:59:48.6698042", "Source": 1}, "ItemId": {"Id": 116228}, "Quantity": 1, "ExpirationDate": "2013-04-29T", "Description": null}}
{"MessageType": "SALES.SALEITEMCREATED", "Event": {"Id": "ZWbDoMKQw6HDjFzCo8KuwpNmwofCjl7Co8OPwpDCncOSXMOdccKTwp3CiFZkZMKWwpfCpMKZ", "RefInfo": {"TId": {"Id": "ZGA="}, "UserId": {"Id": "ZMKj"}, "SentUtc": "2013-01-04T", "Source": 1}, "Code": {"Code": "074108235206"}, "Sku": {"Sku": "Con CS54"}}}
{"MessageType": "SALES.SALEITEMCREATED", "Event": {"Id": "ZWbDoMKQw6HDjFzCo8KuwpNmwofCjl7Co8OPwpDCncOSXMOdccKTZcKHVsKcwpjClsKXwqTCmQ==", "RefInfo": {"TId": {"Id": "ZGA="}, "UserId": {"Id": "ZMKj"}, "SentUtc": "2013-01-04T", "Source": 1}, "Code": {"Code": "4000000021"}, "Sku": {"Sku": "NFL-Wallet-MK-2201"}}}

Since this data is in Json format first I changed the format and tried to find the key:

import json

data = []
with open("data.raw", "r") as f:
    for line in f:
        data.append(json.loads(line))

for lines in data:
    print(lines.keys())

but it gives me dict_keys(['Event', 'MessageType']) for all the lines. What I need (for this data that I attached) is a list like:

'MessageType' 'Event_Id' 'Event_RefInfo_TId_Id'  'Event_RefInfo_UserId_Id' 'Event_RefInfo_SentUtc' 'Event_RefInfo_Source' 'Event_ItemId_Id' 'Event_ItemId_Quantity' 'Event_ItemId_ExpirationDate'     ...

The data is very big and I just need to find out what attributes do I have.

like image 979
Mina Avatar asked Jan 22 '26 03:01

Mina


1 Answers

You'll need to traverse the nested dicts, your current approach only gets as far as the keys of the root dictionary.

You can use the following generator function to find the keys and traverse nested dicts recursively:

import json 
from pprint import pprint

def find_keys(dct):
    for k, v in dct.items():
        if isinstance(v, dict):
            # traverse nested dict
            for x in find_keys(v):
                yield "{}_{}".format(k, x)
        else:
            yield k

Given a list of dictionaries as derived from your json object, you can find the keys in each dict and put them in a set so entries are unique:

s = set()
for d in json.loads(lst):
    s.update(find_keys(d))

pprint(s)

set(['Event_Code_Code',
     'Event_Description',
     'Event_ExpirationDate',
     'Event_Id',
     'Event_ItemId_Id',
     'Event_Quantity',
     'Event_RefInfo_SentUtc',
     'Event_RefInfo_Source',
     'Event_RefInfo_TId_Id',
     'Event_RefInfo_UserId_Id',
     'Event_Sku_Sku',
     'MessageType'])
like image 163
Moses Koledoye Avatar answered Jan 23 '26 15:01

Moses Koledoye