I have a dictionary like this:
{ "id" : "abcde", "key1" : "blah", "key2" : "blah blah", "nestedlist" : [ { "id" : "qwerty", "nestednestedlist" : [ { "id" : "xyz", "keyA" : "blah blah blah" }, { "id" : "fghi", "keyZ" : "blah blah blah" }], "anothernestednestedlist" : [ { "id" : "asdf", "keyQ" : "blah blah" }, { "id" : "yuiop", "keyW" : "blah" }] } ] }
Basically a dictionary with nested lists, dictionaries, and strings, of arbitrary depth.
What is the best way of traversing this to extract the values of every "id" key? I want to achieve the equivalent of an XPath query like "//id". The value of "id" is always a string.
So from my example, the output I need is basically:
["abcde", "qwerty", "xyz", "fghi", "asdf", "yuiop"]
Order is not important.
Count of each unique value in a dictionary If you want to count the occurrences of each value in a Python dictionary, you can use the collections. Counter() function on the dictionary values. It returns the number of times each value occurs in the dictionary.
Iterate over all values of a nested dictionary in python We can achieve all this in a simple manner using recursion. Using the function nested_dict_pair_iterator() we iterated over all the values of a dictionary of dictionaries and printed each pair including the parent keys.
I found this Q/A very interesting, since it provides several different solutions for the same problem. I took all these functions and tested them with a complex dictionary object. I had to take two functions out of the test, because they had to many fail results and they did not support returning lists or dicts as values, which i find essential, since a function should be prepared for almost any data to come.
So i pumped the other functions in 100.000 iterations through the timeit
module and output came to following result:
0.11 usec/pass on gen_dict_extract(k,o) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 6.03 usec/pass on find_all_items(k,o) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0.15 usec/pass on findkeys(k,o) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1.79 usec/pass on get_recursively(k,o) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0.14 usec/pass on find(k,o) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0.36 usec/pass on dict_extract(k,o) - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
All functions had the same needle to search for ('logging') and the same dictionary object, which is constructed like this:
o = { 'temparature': '50', 'logging': { 'handlers': { 'console': { 'formatter': 'simple', 'class': 'logging.StreamHandler', 'stream': 'ext://sys.stdout', 'level': 'DEBUG' } }, 'loggers': { 'simpleExample': { 'handlers': ['console'], 'propagate': 'no', 'level': 'INFO' }, 'root': { 'handlers': ['console'], 'level': 'DEBUG' } }, 'version': '1', 'formatters': { 'simple': { 'datefmt': "'%Y-%m-%d %H:%M:%S'", 'format': '%(asctime)s - %(name)s - %(levelname)s - %(message)s' } } }, 'treatment': {'second': 5, 'last': 4, 'first': 4}, 'treatment_plan': [[4, 5, 4], [4, 5, 4], [5, 5, 5]] }
All functions delivered the same result, but the time differences are dramatic! The function gen_dict_extract(k,o)
is my function adapted from the functions here, actually it is pretty much like the find
function from Alfe, with the main difference, that i am checking if the given object has iteritems function, in case strings are passed during recursion:
def gen_dict_extract(key, var): if hasattr(var,'iteritems'): for k, v in var.iteritems(): if k == key: yield v if isinstance(v, dict): for result in gen_dict_extract(key, v): yield result elif isinstance(v, list): for d in v: for result in gen_dict_extract(key, d): yield result
So this variant is the fastest and safest of the functions here. And find_all_items
is incredibly slow and far off the second slowest get_recursivley
while the rest, except dict_extract
, is close to each other. The functions fun
and keyHole
only work if you are looking for strings.
Interesting learning aspect here :)
d = { "id" : "abcde", "key1" : "blah", "key2" : "blah blah", "nestedlist" : [ { "id" : "qwerty", "nestednestedlist" : [ { "id" : "xyz", "keyA" : "blah blah blah" }, { "id" : "fghi", "keyZ" : "blah blah blah" }], "anothernestednestedlist" : [ { "id" : "asdf", "keyQ" : "blah blah" }, { "id" : "yuiop", "keyW" : "blah" }] } ] } def fun(d): if 'id' in d: yield d['id'] for k in d: if isinstance(d[k], list): for i in d[k]: for j in fun(i): yield j
>>> list(fun(d)) ['abcde', 'qwerty', 'xyz', 'fghi', 'asdf', 'yuiop']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With