Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all occurrences of a key in nested dictionaries and lists

I have a dictionary like this:

{ "id" : "abcde",   "key1" : "blah",   "key2" : "blah blah",   "nestedlist" : [      { "id" : "qwerty",       "nestednestedlist" : [          { "id" : "xyz",           "keyA" : "blah blah blah" },         { "id" : "fghi",           "keyZ" : "blah blah blah" }],       "anothernestednestedlist" : [          { "id" : "asdf",           "keyQ" : "blah blah" },         { "id" : "yuiop",           "keyW" : "blah" }] } ] }  

Basically a dictionary with nested lists, dictionaries, and strings, of arbitrary depth.

What is the best way of traversing this to extract the values of every "id" key? I want to achieve the equivalent of an XPath query like "//id". The value of "id" is always a string.

So from my example, the output I need is basically:

["abcde", "qwerty", "xyz", "fghi", "asdf", "yuiop"] 

Order is not important.

like image 995
Matt Swain Avatar asked Mar 21 '12 15:03

Matt Swain


People also ask

How do you count how many times a key appears in a dictionary?

Count of each unique value in a dictionary If you want to count the occurrences of each value in a Python dictionary, you can use the collections. Counter() function on the dictionary values. It returns the number of times each value occurs in the dictionary.

How do you loop through a nested dictionary in Python?

Iterate over all values of a nested dictionary in python We can achieve all this in a simple manner using recursion. Using the function nested_dict_pair_iterator() we iterated over all the values of a dictionary of dictionaries and printed each pair including the parent keys.


2 Answers

I found this Q/A very interesting, since it provides several different solutions for the same problem. I took all these functions and tested them with a complex dictionary object. I had to take two functions out of the test, because they had to many fail results and they did not support returning lists or dicts as values, which i find essential, since a function should be prepared for almost any data to come.

So i pumped the other functions in 100.000 iterations through the timeit module and output came to following result:

0.11 usec/pass on gen_dict_extract(k,o) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 6.03 usec/pass on find_all_items(k,o) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0.15 usec/pass on findkeys(k,o) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1.79 usec/pass on get_recursively(k,o) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0.14 usec/pass on find(k,o) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0.36 usec/pass on dict_extract(k,o) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

All functions had the same needle to search for ('logging') and the same dictionary object, which is constructed like this:

o = { 'temparature': '50',        'logging': {         'handlers': {           'console': {             'formatter': 'simple',              'class': 'logging.StreamHandler',              'stream': 'ext://sys.stdout',              'level': 'DEBUG'           }         },         'loggers': {           'simpleExample': {             'handlers': ['console'],              'propagate': 'no',              'level': 'INFO'           },          'root': {            'handlers': ['console'],             'level': 'DEBUG'          }        },         'version': '1',         'formatters': {          'simple': {            'datefmt': "'%Y-%m-%d %H:%M:%S'",             'format': '%(asctime)s - %(name)s - %(levelname)s - %(message)s'          }        }      },       'treatment': {'second': 5, 'last': 4, 'first': 4},         'treatment_plan': [[4, 5, 4], [4, 5, 4], [5, 5, 5]] } 

All functions delivered the same result, but the time differences are dramatic! The function gen_dict_extract(k,o) is my function adapted from the functions here, actually it is pretty much like the find function from Alfe, with the main difference, that i am checking if the given object has iteritems function, in case strings are passed during recursion:

def gen_dict_extract(key, var):     if hasattr(var,'iteritems'):         for k, v in var.iteritems():             if k == key:                 yield v             if isinstance(v, dict):                 for result in gen_dict_extract(key, v):                     yield result             elif isinstance(v, list):                 for d in v:                     for result in gen_dict_extract(key, d):                         yield result 

So this variant is the fastest and safest of the functions here. And find_all_items is incredibly slow and far off the second slowest get_recursivley while the rest, except dict_extract, is close to each other. The functions fun and keyHole only work if you are looking for strings.

Interesting learning aspect here :)

like image 89
hexerei software Avatar answered Sep 17 '22 21:09

hexerei software


d = { "id" : "abcde",     "key1" : "blah",     "key2" : "blah blah",     "nestedlist" : [      { "id" : "qwerty",         "nestednestedlist" : [          { "id" : "xyz", "keyA" : "blah blah blah" },         { "id" : "fghi", "keyZ" : "blah blah blah" }],         "anothernestednestedlist" : [          { "id" : "asdf", "keyQ" : "blah blah" },         { "id" : "yuiop", "keyW" : "blah" }] } ] }    def fun(d):     if 'id' in d:         yield d['id']     for k in d:         if isinstance(d[k], list):             for i in d[k]:                 for j in fun(i):                     yield j 

>>> list(fun(d)) ['abcde', 'qwerty', 'xyz', 'fghi', 'asdf', 'yuiop'] 
like image 23
kev Avatar answered Sep 16 '22 21:09

kev