Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently accessing arbitrarily deep dictionaries

Suppose I have a multi-level dictionary like this

mydict = {     'first': {         'second': {             'third': {                 'fourth': 'the end'              }          }      } } 

I'd like to access it like this

test = get_entry(mydict, 'first.second.third.fourth') 

What I have so far is

def get_entry(dict, keyspec):     keys = keyspec.split('.')      result = dict[keys[0]]     for key in keys[1:]:        result = dict[key]      return result 

Are there more efficient ways to do it? According to %timeit the runtime of the function is 1.26us, while accessing the dictionary the standard way like this

foo = mydict['first']['second']['third']['fourth'] 

takes 541ns. I'm looking for ways to trim it to 800ns range if possible.

Thanks

like image 501
Rick Manix Avatar asked Apr 20 '18 16:04

Rick Manix


People also ask

How do you flatten nested dictionary?

Basically the same way you would flatten a nested list, you just have to do the extra work for iterating the dict by key/value, creating new keys for your new dictionary and creating the dictionary at final step. For Python >= 3.3, change the import to from collections.

Can a dictionary have arbitrary data types in Python?

The values of a dictionary can be any type of Python data. So, dictionaries are unordered key-value-pairs.

What is flattened dictionary?

a : to make level or smooth. b : to knock down also : to defeat decisively. c : to make dull or uninspired —often used with out. d : to make lusterless flatten paint. e : to stabilize especially at a lower level.

Can dictionaries can be nested to any depth?

Dictionaries can be nested to any depth. Dictionaries are mutable. Items are accessed by their position in a dictionary. All the keys in a dictionary must be of the same type.


1 Answers

I got a 20% performance boost by tightening up the code a bit but a whopping 400% increase by using a cache for split strings. That only makes a difference if you use the same spec multiple times. Here are sample implementations and a profile script to test.

test.py

mydict = {     'first': {         'second': {             'third': {                 'fourth': 'the end'              }          }      } }  # original def get_entry(dict, keyspec):     keys = keyspec.split('.')      result = dict[keys[0]]     for key in keys[1:]:        result = result[key]      return result  # tighten up code def get_entry_2(mydict, keyspec):     for key in keyspec.split('.'):         mydict = mydict[key]     return mydict  # use a cache cache = {} def get_entry_3(mydict, keyspec):     global cache     try:         spec = cache[keyspec]     except KeyError:         spec = tuple(keyspec.split('.'))         cache[keyspec] = spec      for key in spec:         mydict = mydict[key]     return mydict  if __name__ == "__main__":     test = get_entry(mydict, 'first.second.third.fourth')     print(test) 

profile.py

from timeit import timeit print("original get_entry") print(timeit("get_entry(mydict, 'first.second.third.fourth')",     setup="from test import get_entry, mydict"))  print("get_entry_2 with tighter code") print(timeit("get_entry_2(mydict, 'first.second.third.fourth')",     setup="from test import get_entry_2, mydict"))  print("get_entry_3 with cache of split spec") print(timeit("get_entry_3(mydict, 'first.second.third.fourth')",     setup="from test import get_entry_3, mydict"))  print("just splitting a spec") print(timeit("x.split('.')", setup="x='first.second.third.fourth'")) 

The timing on my machine is

original get_entry 4.148535753000033 get_entry_2 with tighter code 3.2986323120003362 get_entry_3 with cache of split spec 1.3073233439990872 just splitting a spec 1.0949148639992927 

Notice that splitting the spec is a comparatively expensive operation for this function. That's why caching helps.

like image 189
tdelaney Avatar answered Sep 21 '22 17:09

tdelaney