Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flatten dictoinary of arbitrary depth

I receive from another script a dictionary containing various types, in particular other dictionaries or lists that might contain other dictionaries as values.

Now what I want to do is create a single flat dictionary. Keys might be present multiple times within the encapsulated dictionaries. For me the inner most key holds the newest information, so I think dict.update is the right routine to apply when digesting a 'inner' dict. By 'inner' dict I mean a dictionary withing some value of the outermost dictionary.

Now, I understand how to flatten a dictionary by 1 level. What I struggle with to to flatten it by arbitrarily many levels.

A simple example example of the type of dictionary I'm dealing with is:

d = {1: {6: {7: {2: {'a'}}}}, 2: 'b', 3: {4: {2: 'c'}}, 5: ['a', 'b', {1: 'a'}]}

My attempt works ok for a single level of depth:

dd = dict()
for k, v in d.items():
    if isinstance(v, dict):
        dd.update(v)
    elif isinstance(v, list):
        for el in v:
            if isinstance(el, dict):
                dd.update(el)
        dd[k] = [el for el in v if not isinstance(el, dict)]
    else:
        dd[k] = v

This gives me:

Out[56]:  {6: {7: {2: {'a'}}}, 2: 'b', 4: {2: 'c'}, 1: 'a', 5: ['a', 'b']}

What it should give is:

{2: 'a', 5: ['a', 'b']}

Note the value of the key 2: 'c' and not (as I get now) 'b'. This should be because the inner-most value for the key 2 is 'c' and not 'b'.

I'm not just looking to get a functioning code (although this would allow me to continue working) but I'd like to understand how such a problem is tackled in python. I have to admit that I'm a little lost here...

Any help is greatly appreciated!

like image 305
TimC Avatar asked Jan 27 '23 23:01

TimC


1 Answers

You can use recursion with a generator and keep a counter to determine the depth:

d = {1: {6: {7: {2: {'a'}}}}, 2: 'b', 3: {4: {2: 'c'}}, 5: ['a', 'b', {1: 'a'}]}
def flatten(_d, _depth = 0):
  for a, b in _d.items():
     if isinstance(b, list):
       yield [a, [i for i in b if not isinstance(i, dict)], _depth]
       for c in b:
          if isinstance(c, dict):
             yield from flatten(c, _depth+1)
     elif isinstance(b, dict):
        yield from flatten(b, _depth+1)
     else:
        yield [a, b, _depth]

_result = {}
for a, b, c in flatten(d):
  if a not in _result:
     _result[a] = [b, c]
  else:
     if _result[a][-1] < c:
       _result[a] = [b, c]
print({a:b for a, [b, c] in _result.items()})

Output:

{2: {'a'}, 5: ['a', 'b'], 1: 'a'}
like image 96
Ajax1234 Avatar answered Feb 04 '23 23:02

Ajax1234