I'm trying to find all the attributes of the data in a nested dictionary in Python. Some objects may have multiple levels in their key definition. How can I find the header of such a complicated nested data (if we think as a table structure). Here are very few lines of my data to see how it looks like:
{"MessageType": "SALES.HOLDCREATED", "Event": {"Id": "ZWbDoMKQw6HDjFzCo8KuwpNmwofCjl7Co8OPwpDCncOSXMOdccKTZVVWZWbCnA==", "RefInfo": {"TId": {"Id": "ZMKXwpbClsOhwpNiw5E="}, "UserId": {"Id": "wpzCksKWwpbCpMKTYsKeZMKZbA=="}, "SentUtc": "2013-04-28T16:59:48.6698042", "Source": 1}, "ItemId": {"Id": 116228}, "Quantity": 1, "ExpirationDate": "2013-04-29T", "Description": null}}
{"MessageType": "SALES.SALEITEMCREATED", "Event": {"Id": "ZWbDoMKQw6HDjFzCo8KuwpNmwofCjl7Co8OPwpDCncOSXMOdccKTwp3CiFZkZMKWwpfCpMKZ", "RefInfo": {"TId": {"Id": "ZGA="}, "UserId": {"Id": "ZMKj"}, "SentUtc": "2013-01-04T", "Source": 1}, "Code": {"Code": "074108235206"}, "Sku": {"Sku": "Con CS54"}}}
{"MessageType": "SALES.SALEITEMCREATED", "Event": {"Id": "ZWbDoMKQw6HDjFzCo8KuwpNmwofCjl7Co8OPwpDCncOSXMOdccKTZcKHVsKcwpjClsKXwqTCmQ==", "RefInfo": {"TId": {"Id": "ZGA="}, "UserId": {"Id": "ZMKj"}, "SentUtc": "2013-01-04T", "Source": 1}, "Code": {"Code": "4000000021"}, "Sku": {"Sku": "NFL-Wallet-MK-2201"}}}
Since this data is in Json format first I changed the format and tried to find the key:
import json
data = []
with open("data.raw", "r") as f:
for line in f:
data.append(json.loads(line))
for lines in data:
print(lines.keys())
but it gives me dict_keys(['Event', 'MessageType']) for all the lines.
What I need (for this data that I attached) is a list like:
'MessageType' 'Event_Id' 'Event_RefInfo_TId_Id' 'Event_RefInfo_UserId_Id' 'Event_RefInfo_SentUtc' 'Event_RefInfo_Source' 'Event_ItemId_Id' 'Event_ItemId_Quantity' 'Event_ItemId_ExpirationDate' ...
The data is very big and I just need to find out what attributes do I have.
You'll need to traverse the nested dicts, your current approach only gets as far as the keys of the root dictionary.
You can use the following generator function to find the keys and traverse nested dicts recursively:
import json
from pprint import pprint
def find_keys(dct):
for k, v in dct.items():
if isinstance(v, dict):
# traverse nested dict
for x in find_keys(v):
yield "{}_{}".format(k, x)
else:
yield k
Given a list of dictionaries as derived from your json object, you can find the keys in each dict and put them in a set so entries are unique:
s = set()
for d in json.loads(lst):
s.update(find_keys(d))
pprint(s)
set(['Event_Code_Code',
'Event_Description',
'Event_ExpirationDate',
'Event_Id',
'Event_ItemId_Id',
'Event_Quantity',
'Event_RefInfo_SentUtc',
'Event_RefInfo_Source',
'Event_RefInfo_TId_Id',
'Event_RefInfo_UserId_Id',
'Event_Sku_Sku',
'MessageType'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With