Find all occurrences of a key in nested dictionaries and lists

Tags:

I have a dictionary like this:

{ "id" : "abcde",   "key1" : "blah",   "key2" : "blah blah",   "nestedlist" : [      { "id" : "qwerty",       "nestednestedlist" : [          { "id" : "xyz",           "keyA" : "blah blah blah" },         { "id" : "fghi",           "keyZ" : "blah blah blah" }],       "anothernestednestedlist" : [          { "id" : "asdf",           "keyQ" : "blah blah" },         { "id" : "yuiop",           "keyW" : "blah" }] } ] }

Basically a dictionary with nested lists, dictionaries, and strings, of arbitrary depth.

What is the best way of traversing this to extract the values of every "id" key? I want to achieve the equivalent of an XPath query like "//id". The value of "id" is always a string.

So from my example, the output I need is basically:

["abcde", "qwerty", "xyz", "fghi", "asdf", "yuiop"]

Order is not important.

995

asked Mar 21 '12 15:03

Matt Swain

2 Answers

I found this Q/A very interesting, since it provides several different solutions for the same problem. I took all these functions and tested them with a complex dictionary object. I had to take two functions out of the test, because they had to many fail results and they did not support returning lists or dicts as values, which i find essential, since a function should be prepared for almost any data to come.

So i pumped the other functions in 100.000 iterations through the timeit module and output came to following result:

0.11 usec/pass on gen_dict_extract(k,o) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 6.03 usec/pass on find_all_items(k,o) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0.15 usec/pass on findkeys(k,o) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1.79 usec/pass on get_recursively(k,o) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0.14 usec/pass on find(k,o) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0.36 usec/pass on dict_extract(k,o) - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

All functions had the same needle to search for ('logging') and the same dictionary object, which is constructed like this:

o = { 'temparature': '50',        'logging': {         'handlers': {           'console': {             'formatter': 'simple',              'class': 'logging.StreamHandler',              'stream': 'ext://sys.stdout',              'level': 'DEBUG'           }         },         'loggers': {           'simpleExample': {             'handlers': ['console'],              'propagate': 'no',              'level': 'INFO'           },          'root': {            'handlers': ['console'],             'level': 'DEBUG'          }        },         'version': '1',         'formatters': {          'simple': {            'datefmt': "'%Y-%m-%d %H:%M:%S'",             'format': '%(asctime)s - %(name)s - %(levelname)s - %(message)s'          }        }      },       'treatment': {'second': 5, 'last': 4, 'first': 4},         'treatment_plan': [[4, 5, 4], [4, 5, 4], [5, 5, 5]] }

All functions delivered the same result, but the time differences are dramatic! The function gen_dict_extract(k,o) is my function adapted from the functions here, actually it is pretty much like the find function from Alfe, with the main difference, that i am checking if the given object has iteritems function, in case strings are passed during recursion:

def gen_dict_extract(key, var):     if hasattr(var,'iteritems'):         for k, v in var.iteritems():             if k == key:                 yield v             if isinstance(v, dict):                 for result in gen_dict_extract(key, v):                     yield result             elif isinstance(v, list):                 for d in v:                     for result in gen_dict_extract(key, d):                         yield result

So this variant is the fastest and safest of the functions here. And find_all_items is incredibly slow and far off the second slowest get_recursivley while the rest, except dict_extract, is close to each other. The functions fun and keyHole only work if you are looking for strings.

Interesting learning aspect here :)

answered Sep 17 '22 21:09

hexerei software

d = { "id" : "abcde",     "key1" : "blah",     "key2" : "blah blah",     "nestedlist" : [      { "id" : "qwerty",         "nestednestedlist" : [          { "id" : "xyz", "keyA" : "blah blah blah" },         { "id" : "fghi", "keyZ" : "blah blah blah" }],         "anothernestednestedlist" : [          { "id" : "asdf", "keyQ" : "blah blah" },         { "id" : "yuiop", "keyW" : "blah" }] } ] }    def fun(d):     if 'id' in d:         yield d['id']     for k in d:         if isinstance(d[k], list):             for i in d[k]:                 for j in fun(i):                     yield j

>>> list(fun(d)) ['abcde', 'qwerty', 'xyz', 'fghi', 'asdf', 'yuiop']

answered Sep 16 '22 21:09

kev

Related questions
                            
                                Choosing the correct upper and lower HSV boundaries for color detection with`cv::inRange` (OpenCV)
                            
                                Public free web services for testing soap client [closed]
                            
                                Why are assertEquals() parameters in the order (expected, actual)?
                            
                                WhatsApp API (java/python) [closed]
                            
                                What is the role of TimeDistributed layer in Keras?
                            
                                Add numpy array as column to Pandas data frame
                            
                                Python regex - r prefix
                            
                                Two-sample Kolmogorov-Smirnov Test in Python Scipy
                            
                                How to do an upsert with SqlAlchemy?
                            
                                background function in Python
                            
                                Why can I use a list index as an indexing variable in a for loop? [duplicate]
                            
                                Imshow: extent and aspect
                            
                                How to transform numpy.matrix or array to scipy sparse matrix
                            
                                What exactly is the point of memoryview in Python
                            
                                Safely create a file if and only if it does not exist with Python
                            
                                Is enumerate in python lazy?
                            
                                Using ConfigParser to read a file without section name
                            
                                Using a pre-trained word embedding (word2vec or Glove) in TensorFlow
                            
                                Dynamically import a method in a file, from a string
                            
                                is it possible to do fuzzy match merge with python pandas?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find all occurrences of a key in nested dictionaries and lists

Tags:

python

dictionary

recursion

traversal

Matt Swain

People also ask

2 Answers

hexerei software

kev

Recent Activity

Donate For Us