Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Recursive Search of Dict with Nested Keys

I recently had to solve a problem in a real data system with a nested dict/list combination. I worked on this for quite a while and came up with a solution, but I am very unsatisfied. I had to resort to using globals() and a named temporary global parameter.

I do not like to use globals. That's just asking for an injection vulnerability. I feel that there must be a better way to perform this task without resorting to globals.

Problem Dataset:

d = {
    "k":1,
    "stuff":"s1",
    "l":{"m":[
        {
            "k":2,
            "stuff":"s2",
            "l":None
        },
        {
            "k":3,
            "stuff":"s3",
            "l":{"m":[
                {
                    "k":4,
                    "stuff":"s4",
                    "l":None
                },
                {
                    "k":5,
                    "stuff":"s5",
                    "l":{"m":[
                        {
                            "k":6,
                            "stuff":"s6",
                            "l":None
                        },
                    ]}
                },
            ]}
        },
    ]}
}

Desired Output:

[{'k': 1, 'stuff': 's1'},
 {'k': 2, 'stuff': 's2'},
 {'k': 3, 'stuff': 's3'},
 {'k': 4, 'stuff': 's4'},
 {'k': 5, 'stuff': 's5'},
 {'k': 6, 'stuff': 's6'}]

My Solution:

def _get_recursive_results(d, iter_key, get_keys):
    if not 'h' in globals():
        global h
        h = []
    h.append({k:d.get(k) for k in get_keys})

    d2 = d.copy()
    for k in iter_key:
        if not d2:
            continue
        d2 = d2.get(k)

    for td in d2:
        d3 = td.copy()
        for k in iter_key:
            if not d3:
                continue
            d3 = d3.get(k)

        if d3:
            return _get_recursive_results(td, iter_key, get_keys)
        h.append({k:td.get(k) for k in get_keys})
    else:
        l = [k for k in h]
        del globals()['h']
        return l

Calling my function as follows returns the desired result:

_get_recursively(d, ['l','m'], ['k','stuff'])

How would I build a better solution?

like image 584
ok123jump Avatar asked Apr 23 '16 08:04

ok123jump


4 Answers

This is a slightly modified version without using globals. Set h to None as default and create a new list for the first call to _get_recursive_results(). Later provide h as an argument in the recursive calls to _get_recursive_results():

def _get_recursive_results(d, iter_key, get_keys, h=None):
    if h is None:
        h = []
    h.append({k:d.get(k) for k in get_keys})
    d2 = d.copy()
    for k in iter_key:
        if not d2:
            continue
        d2 = d2.get(k)
    for td in d2:
        d3 = td.copy()
        for k in iter_key:
            if not d3:
                continue
            d3 = d3.get(k)
        if d3:
            return _get_recursive_results(td, iter_key, get_keys, h)
        h.append({k:td.get(k) for k in get_keys})
    else:
        l = [k for k in h]
        return l

Now:

>>> _get_recursive_results(d, ['l','m'], ['k','stuff'])
[{'k': 1, 'stuff': 's1'},
 {'k': 2, 'stuff': 's2'},
 {'k': 3, 'stuff': 's3'},
 {'k': 4, 'stuff': 's4'},
 {'k': 5, 'stuff': 's5'},
 {'k': 6, 'stuff': 's6'}]

There is no need for the copying of intermediate dicts. This is a further modified version without copying:

def _get_recursive_results(d, iter_key, get_keys, h=None):
    if h is None:
        h = []
    h.append({k: d.get(k) for k in get_keys})
    for k in iter_key:
        if not d:
            continue
        d = d.get(k)
    for td in d:
        d3 = td
        for k in iter_key:
            if not d3:
                continue
            d3 = d3.get(k)
        if d3:
            return _get_recursive_results(td, iter_key, get_keys, h)
        h.append({k: td.get(k) for k in get_keys})
    else:
        return h
like image 162
Mike Müller Avatar answered Oct 16 '22 13:10

Mike Müller


This is not as generic but it does the job:

def parse_tree(d, keys):
   result = [{key: d[key] for key in keys}]
   l = d.get('l', None)
   if l is not None:
       entries = l.get('m', [])
       for entry in entries:
           result.extend(parse_tree(entry))
   return result


>>> parse_tree(d, ['k', 'stuff'])
[{'k': 1, 'stuff': 's1'},
 {'k': 2, 'stuff': 's2'},
 {'k': 3, 'stuff': 's3'},
 {'k': 4, 'stuff': 's4'},
 {'k': 5, 'stuff': 's5'},
 {'k': 6, 'stuff': 's6'}]
like image 35
AKS Avatar answered Oct 16 '22 13:10

AKS


Use generator

With following generator:

def get_stuff(dct, iter_keys, get_keys):
    k, stuff = get_keys
    l, m = iter_keys
    if k in dct:
        yield {k: dct[k], stuff: dct[stuff]}
        if dct.get(l):
            for subdct in dct[l][m]:
                for res in get_stuff(subdct, iter_keys, get_keys):
                    yield res


list(get_stuff(d, ["l", "m"], ["k", "stuff"]))

you get the results by:

list(get_stuff(d))

Python 3.3 provides new yield from expression used to delegate yielding to subgenerator. Using this expression, the code can be one line shorter:

def get_stuff(dct):
    if "k" in dct:
        yield {"k": dct["k"], "stuff": dct["stuff"]}
        if dct.get("l"):
            for subdct in dct["l"]["m"]:
                yield from get_stuff(subdct)

def get_stuff(dct, iter_keys, get_keys):
    k, stuff = get_keys
    l, m = iter_keys
    if k in dct:
        yield {k: dct[k], stuff: dct[stuff]}
        if dct.get(l):
            for subdct in dct[l][m]:
                yield from get_stuff(subdct, iter_keys, get_keys):

Some methods to avoid globals

generators

Often, if you need to build up a list and search for replacing global variables, generators might come handy as they keep status of current work in its local variables plus building up the whole result is postponed to consuming generated values.

recursion

Recursion store the subresults in local variables in stack.

class instance with internal property

A class can serve as a tin to encapsulate your variables.

Instead of using global variable, you store intermediate result in instance property.

Generalize for different data structures

In your comments you mentioned, that you receive many different types with each dump.

I will assume, that your data fulfill following expectations:

  • has tree-like structure
  • each node in the tree shall contribute to result some result (e.g. a dictionary {"k": xx, "stuff": yy})
  • each node might contain subitems (list of subnodes)

One option to make the solution more general is to provide list of keys to use to access the value/subitems, another option is to provide a function, which does the work of getting the node value and subitems.

Here I use get_value to deliver node value and get_subitems to deliver subnodes:

def get_value(data):
    try:
        return {"k": data["k"], "stuff": data["stuff"]}
    except KeyError:
        return None


def get_subitems(data):
    try:
        return data["l"]["m"]
    except TypeError:
        return None

The processing is then done by:

def get_stuff(dct, get_value_fun, get_subitems_fun):
    value = get_value(dct)
    if value:
        yield value
        lst = get_subitems_fun(dct)
        if lst:
            for subdct in lst:
                for res in get_stuff(subdct, get_value_fun, get_subitems_fun):
                    yield res

called in this way:

get_stuff(d, get_value, get_subitems)

Advantage of using functions is that it is much more flexible for whatever data structures you would have to process (adapting to other data structures would require only providing customized version of function get_value and get_subitems - having the same or different names according to your preferences.

like image 4
Jan Vlcinsky Avatar answered Oct 16 '22 14:10

Jan Vlcinsky


Edit: First version had a bug which is now corrected

I believe this should work, we're using the power of recursion!

def strip_leaves_from_tree(my_tree):
    result = list()
    row = dict()
    for key in my_tree:
        child = my_tree[key]
        if type(child) in (int, str,):
            row[key] = child
        elif isinstance(child, dict):
            result = strip_leaves_from_tree(child)
        elif isinstance(child, list):
            for element in child:
                result += strip_leaves_from_tree(element)
    if row: result = [row,]+result
    return result
like image 3
Dion Bridger Avatar answered Oct 16 '22 12:10

Dion Bridger