I recently had to solve a problem in a real data system with a nested dict/list combination. I worked on this for quite a while and came up with a solution, but I am very unsatisfied. I had to resort to using globals()
and a named temporary global parameter.
I do not like to use globals. That's just asking for an injection vulnerability. I feel that there must be a better way to perform this task without resorting to globals.
Problem Dataset:
d = {
"k":1,
"stuff":"s1",
"l":{"m":[
{
"k":2,
"stuff":"s2",
"l":None
},
{
"k":3,
"stuff":"s3",
"l":{"m":[
{
"k":4,
"stuff":"s4",
"l":None
},
{
"k":5,
"stuff":"s5",
"l":{"m":[
{
"k":6,
"stuff":"s6",
"l":None
},
]}
},
]}
},
]}
}
Desired Output:
[{'k': 1, 'stuff': 's1'},
{'k': 2, 'stuff': 's2'},
{'k': 3, 'stuff': 's3'},
{'k': 4, 'stuff': 's4'},
{'k': 5, 'stuff': 's5'},
{'k': 6, 'stuff': 's6'}]
My Solution:
def _get_recursive_results(d, iter_key, get_keys):
if not 'h' in globals():
global h
h = []
h.append({k:d.get(k) for k in get_keys})
d2 = d.copy()
for k in iter_key:
if not d2:
continue
d2 = d2.get(k)
for td in d2:
d3 = td.copy()
for k in iter_key:
if not d3:
continue
d3 = d3.get(k)
if d3:
return _get_recursive_results(td, iter_key, get_keys)
h.append({k:td.get(k) for k in get_keys})
else:
l = [k for k in h]
del globals()['h']
return l
Calling my function as follows returns the desired result:
_get_recursively(d, ['l','m'], ['k','stuff'])
How would I build a better solution?
This is a slightly modified version without using globals. Set h
to None
as default and create a new list for the first call to _get_recursive_results()
. Later provide h
as an argument in the recursive calls to _get_recursive_results()
:
def _get_recursive_results(d, iter_key, get_keys, h=None):
if h is None:
h = []
h.append({k:d.get(k) for k in get_keys})
d2 = d.copy()
for k in iter_key:
if not d2:
continue
d2 = d2.get(k)
for td in d2:
d3 = td.copy()
for k in iter_key:
if not d3:
continue
d3 = d3.get(k)
if d3:
return _get_recursive_results(td, iter_key, get_keys, h)
h.append({k:td.get(k) for k in get_keys})
else:
l = [k for k in h]
return l
Now:
>>> _get_recursive_results(d, ['l','m'], ['k','stuff'])
[{'k': 1, 'stuff': 's1'},
{'k': 2, 'stuff': 's2'},
{'k': 3, 'stuff': 's3'},
{'k': 4, 'stuff': 's4'},
{'k': 5, 'stuff': 's5'},
{'k': 6, 'stuff': 's6'}]
There is no need for the copying of intermediate dicts. This is a further modified version without copying:
def _get_recursive_results(d, iter_key, get_keys, h=None):
if h is None:
h = []
h.append({k: d.get(k) for k in get_keys})
for k in iter_key:
if not d:
continue
d = d.get(k)
for td in d:
d3 = td
for k in iter_key:
if not d3:
continue
d3 = d3.get(k)
if d3:
return _get_recursive_results(td, iter_key, get_keys, h)
h.append({k: td.get(k) for k in get_keys})
else:
return h
This is not as generic but it does the job:
def parse_tree(d, keys):
result = [{key: d[key] for key in keys}]
l = d.get('l', None)
if l is not None:
entries = l.get('m', [])
for entry in entries:
result.extend(parse_tree(entry))
return result
>>> parse_tree(d, ['k', 'stuff'])
[{'k': 1, 'stuff': 's1'},
{'k': 2, 'stuff': 's2'},
{'k': 3, 'stuff': 's3'},
{'k': 4, 'stuff': 's4'},
{'k': 5, 'stuff': 's5'},
{'k': 6, 'stuff': 's6'}]
With following generator:
def get_stuff(dct, iter_keys, get_keys):
k, stuff = get_keys
l, m = iter_keys
if k in dct:
yield {k: dct[k], stuff: dct[stuff]}
if dct.get(l):
for subdct in dct[l][m]:
for res in get_stuff(subdct, iter_keys, get_keys):
yield res
list(get_stuff(d, ["l", "m"], ["k", "stuff"]))
you get the results by:
list(get_stuff(d))
Python 3.3 provides new yield from
expression used to delegate yielding to subgenerator. Using this expression, the code can be one line shorter:
def get_stuff(dct):
if "k" in dct:
yield {"k": dct["k"], "stuff": dct["stuff"]}
if dct.get("l"):
for subdct in dct["l"]["m"]:
yield from get_stuff(subdct)
def get_stuff(dct, iter_keys, get_keys):
k, stuff = get_keys
l, m = iter_keys
if k in dct:
yield {k: dct[k], stuff: dct[stuff]}
if dct.get(l):
for subdct in dct[l][m]:
yield from get_stuff(subdct, iter_keys, get_keys):
globals
Often, if you need to build up a list and search for replacing global variables, generators might come handy as they keep status of current work in its local variables plus building up the whole result is postponed to consuming generated values.
Recursion store the subresults in local variables in stack.
A class can serve as a tin to encapsulate your variables.
Instead of using global variable, you store intermediate result in instance property.
In your comments you mentioned, that you receive many different types with each dump.
I will assume, that your data fulfill following expectations:
{"k": xx, "stuff": yy}
)One option to make the solution more general is to provide list of keys to use to access the value/subitems, another option is to provide a function, which does the work of getting the node value and subitems.
Here I use get_value
to deliver node value and get_subitems
to deliver subnodes:
def get_value(data):
try:
return {"k": data["k"], "stuff": data["stuff"]}
except KeyError:
return None
def get_subitems(data):
try:
return data["l"]["m"]
except TypeError:
return None
The processing is then done by:
def get_stuff(dct, get_value_fun, get_subitems_fun):
value = get_value(dct)
if value:
yield value
lst = get_subitems_fun(dct)
if lst:
for subdct in lst:
for res in get_stuff(subdct, get_value_fun, get_subitems_fun):
yield res
called in this way:
get_stuff(d, get_value, get_subitems)
Advantage of using functions is that it is much more flexible for whatever data
structures you would have to process (adapting to other data structures would require only providing customized version of function get_value
and get_subitems
- having the same or different names according to your preferences.
Edit: First version had a bug which is now corrected
I believe this should work, we're using the power of recursion!
def strip_leaves_from_tree(my_tree):
result = list()
row = dict()
for key in my_tree:
child = my_tree[key]
if type(child) in (int, str,):
row[key] = child
elif isinstance(child, dict):
result = strip_leaves_from_tree(child)
elif isinstance(child, list):
for element in child:
result += strip_leaves_from_tree(element)
if row: result = [row,]+result
return result
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With