Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nested dictionaries: extracting paths to leaves [closed]

Tags:

python

I have a Python nested dictionary as the following:

{'dist_river': 
  {'high': 
    {'wind_speed': 
      {'1': 
        {'population': 
          {'high': 
            {'school': 
              {'high':'T', 'medium':'T', 'low':'F'}
            }, 
            'medium': 
              {'land_cover': 
                {'Mix_garden': 
                  {'income_source': 
                    {'Plantation':'T', 'Agriculture':'F'}
                  }
                }
              }
            }
          }
        }
      },
    'low': 'F'
  }
}

How can I get subdictionaries from the nested dictionary ?. For example, subdictionaries from dic:

results = [
    {'dist_river': 
     {'high': 
      {'wind_speed': 
       {'1': 
        {'population': 
         {'high': 
          {'school': 
           {'high': 'T', 'medium': 'T', 'low': 'F'}
          }}}}}}},

    {'dist_river': 
     {'high': 
      {'wind_speed': 
       {'1': 
        {'population': 
         {'medium': 
          {'land_cover': 
           {'Mix_garden': 
            {'income_source': 
             {'Plantation': 'T', 'Agriculture': 'F'}
            }}}}}}}}},

    {'dist_river': 
     {'low': 'F'}
    }
]

lengths(results) == 3

Thank you for your help

community edit: It appears that each resulting dictionary must have only one entry for each level of nesting. Put another way, each result contains the entire path to each leaf in the dictionary tree. – Tim Pietzcker 13 hours ago

like image 228
Imas Avatar asked Oct 16 '25 12:10

Imas


2 Answers

import collections

def isDict(d):
    return isinstance(d, collections.Mapping)

def isAtomOrFlat(d):
    return not isDict(d) or not any(isDict(v) for v in d.values())

def leafPaths(nestedDicts, noDeeper=isAtomOrFlat):
    """
        For each leaf in NESTEDDICTS, this yields a 
        dictionary consisting of only the entries between the root
        and the leaf.
    """
    for key,value in nestedDicts.items():
        if noDeeper(value):
            yield {key: value}
        else:
            for subpath in leafPaths(value):
                yield {key: subpath}

Demo:

>>> pprint.pprint(list( leafPaths(dic) ))
[{'dist_river': {'high': {'wind_speed': {'1': {'population': {'high': {'school': {'high': 'T',
                                                                                  'low': 'F',
                                                                                  'medium': 'T'}}}}}}}},
 {'dist_river': {'high': {'wind_speed': {'1': {'population': {'medium': {'land_cover': {'Mix_garden': {'income_source': {'Agriculture': 'F',
                                                                                                                         'Plantation': 'T'}}}}}}}}}},
 {'dist_river': {'low': 'F'}}]

sidenote 1: However unless this format was warranted for some reason I personally feel it would be better to yield nodes in the manner of tuples, e.g. something like:

...noDeeper=lambda x:not isDict(x)...
...yield tuple(value)
...yield (key,)+subpath

[('dist_river', 'high', 'wind_speed', '1', 'population', 'high', 'school', 'high', 'T'),
 ('dist_river', 'high', 'wind_speed', '1', 'population', 'high', 'school', 'medium', 'T'),
 ('dist_river', 'high', 'wind_speed', '1', 'population', 'high', 'school', 'low', 'F'),
 ('dist_river', 'high', 'wind_speed', '1', 'population', 'medium', 'land_cover', 'Mix_garden', 'income_source', 'Plantation', 'T'),
 ('dist_river', 'high', 'wind_speed', '1', 'population', 'medium', 'land_cover', 'Mix_garden', 'income_source', 'Agriculture', 'F'),
 ('dist_river', 'low', 'F')]

(Extracted easily from the "straightforward" answer, which happens to be thg435's answer.)


sidenote 2: Do note that the naive implementation is not what the OP is looking for. The naive implementation would have noDeeper=lambda x:not isDict(x), with the result of:

>>> pprint.pprint(list( leafPaths(dic) ))
[{'dist_river': {'high': {'wind_speed': {'1': {'population': {'high': {'school': {'high': 'T'}}}}}}}},
 {'dist_river': {'high': {'wind_speed': {'1': {'population': {'high': {'school': {'medium': 'T'}}}}}}}},
 {'dist_river': {'high': {'wind_speed': {'1': {'population': {'high': {'school': {'low': 'F'}}}}}}}},
 {'dist_river': {'high': {'wind_speed': {'1': {'population': {'medium': {'land_cover': {'Mix_garden': {'income_source': {'Plantation': 'T'}}}}}}}}}},
 {'dist_river': {'high': {'wind_speed': {'1': {'population': {'medium': {'land_cover': {'Mix_garden': {'income_source': {'Agriculture': 'F'}}}}}}}}}},
 {'dist_river': {'low': 'F'}}]

Edit: this is an inefficient algorithm. Each leaf L is re-yielded depth(L) times. More efficient would be to chain generators maybe with a custom data structure, or simulate the stack manually.

like image 186
ninjagecko Avatar answered Oct 19 '25 03:10

ninjagecko


Maybe this:

def enum_paths(p):
    if not hasattr(p, 'items'):
        yield p
    else:
        for k, v in p.items():
            for x in enum_paths(v):
                yield {k: x}


for x in enum_paths(dic):
    print x
like image 28
georg Avatar answered Oct 19 '25 01:10

georg