Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problems while traversing and extracting the nodes of a tree?

I have the following trees (tree_1, tree_2, tree_3) stored in a dictionary (dict_1, dict_2, dict_3). How can I recursively traverse all the paths of the tree, collecting all the nodes from the root to the last node of each branch?

In other words, I would like to generate a list of all the possible sequences of nodes for all the branches of the trees (dicts). For example some possible branches for dict_1 (tree_1)are:

[["FunctionDef", "try", "ExceptHandler", "Expr", "Call", "Attribute","Load"],
["FunctionDef", "try", "ExceptHandler", "Expr", "Call", "Attribute","save_dictionary"],
["FunctionDef", "try", "ExceptHandler", "Expr", "Call", "Attribute","Name", "self"],
..., 
["FunctionDef", "arguments", "arg", "self"]]

So far, from a previous question I tried to:

def foo(nested_dict, c = []):
   for i in ['left', 'op', 'right', 'func', 'value', 'args', 'ctx',
             'body', 'comparators', 'ops', 'test', 'orelse', 'targets', 'slice', 'n', 'id', '_type']:
      if i in nested_dict:
        if isinstance(nested_dict[i], list):
            for b in nested_dict[i]:
                yield from foo(b, c+[nested_dict['_type']])
        elif isinstance(nested_dict[i], dict): #simple check here
            yield from foo(nested_dict[i], c+[nested_dict['_type']])
        else:
            yield c+[nested_dict[i]]

and

def foo_2(nested_dict, c = []):
  targets = {'left', 'op', 'right', 'func', 'value', 'args', 'ctx', 'body',
             'comparators', 'ops', 'test', 'orelse', 'targets', 'slice', 'n',
             'id', 'slice', 'annotation', 'arg', 'elts', 's', '_type'}
  for a, b in nested_dict.items():
     if a in targets:
        if isinstance(b, dict):
           yield from foo_2(b, c+[a])
        elif isinstance(b, list):
           for i in b:
              yield from foo_2(i, c+[a])
        else:
            yield c+[b]

However, both of them don't work because I am getting the wrong sequences. I tried to modify the targets because I thought that this issue was related to the fact that a target can not be reached out. Nevertheless, I am getting either incomplete or incorrect paths, and in general it is still not working, any idea of how to generate one list per path given a tree?

Here's a minimal example, given this tree:

{'_type': 'Expr',
 'col_offset': 0,
 'lineno': 1,
 'value': {'_type': 'Call',
  'args': [{'_type': 'BinOp',
    'col_offset': 6,
    'left': {'_type': 'Num', 'col_offset': 6, 'lineno': 1, 'n': 1},
    'lineno': 1,
    'op': {'_type': 'Add'},
    'right': {'_type': 'Num', 'col_offset': 8, 'lineno': 1, 'n': 2}}],
  'col_offset': 0,
  'func': {'_type': 'Name',
   'col_offset': 0,
   'ctx': {'_type': 'Load'},
   'id': 'print',
   'lineno': 1},
  'keywords': [],
  'lineno': 1}}

The output should be:

[["Expr", "Call", "Name", "print"],
["Expr", "Call", "Name", "Load"],
["Expr", "Call", "Binop", "Num", "1"],
["Expr", "Call", "Binop", "add"],
["Expr", "Call", "Binop", "Num", "2"]]
like image 641
J Do Avatar asked Nov 06 '22 15:11

J Do


1 Answers

A first small fix for function 1 is to notice branches ending with a dict containing only _type and branches ending with a simpler object (id for example), are both handled with your else, but require two different things:

  1. The first will be handled with a dict recursive call, and so will add the previous _type node.
  2. The second one, will get to else in the previous node iteration, but only adds itself, meaning you forgot the "current" _type node. So the else becomes:

    elif i == '_type':
        #Got here recursively, previous node already added
        yield c+[nested_dict[i]]
    else:
        #Got here in the iteration of the "previous" node, so need to add it as well.
        yield c+[nested_dict['_type'],nested_dict[i]]
    

No you will get all your branches, but you also get all sub-branches (['Expr'],['Expr','Call'],['Expr','Call','BinOp']...). This means we are yielding at a wrong place somewhere! We are yielding _type nodes now even if they are not leaves. It's also clear that we always need c to have _type anyway. Second solution that pops to mind:

def foo(nested_dict, c = []):
    yielded = False
    c = c+[nested_dict['_type']] #Good place for try...except and validation check
    for i in ['left', 'op', 'right', 'func', 'value', 'args', 'ctx',
    'body', 'comparators', 'ops', 'test', 'orelse', 'targets', 'slice', 'n', 'id']:
        if i in nested_dict:
            yielded = True
            if isinstance(nested_dict[i], list):
                for b in nested_dict[i]:
                    yield from foo(b, c)
            elif isinstance(nested_dict[i], dict): #simple check here
                yield from foo(nested_dict[i], c)
            else:
                yield c+[nested_dict[i]]
    #`_type` is leaf
    if not yielded: yield c

Note I removed _type from the oeprations since there is no point in iterating it now. This way we can just use else in the loop. Function 2 is the same thing, so I leave you to fix it.

like image 110
kabanus Avatar answered Nov 14 '22 20:11

kabanus