Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

extract trees from DAG

I have a DAG expressed as nodes and their successor edges. Reifying it as a nested data structure is possible with a simple recursive function.

#tree1.pl
#!/usr/bin/env perl
use 5.028; use strictures; use Moops; use Kavorka qw(fun); use List::AllUtils qw(first);
class Node :ro {
    has label => isa => Str;
    has children => isa => ArrayRef[Str];
}
fun N($label, $children) {
    return Node->new(label => $label, children => $children);
}

# list is really flat, but
# indentation outlines desired tree structure
our @dag = (
    N(N0 => ['N1']),
        N(N1 => ['N2']),
            N(N2 => ['N3']),
                N(N3 => ['N4', 'N5']),
                    N(N4 => []),
                    N(N5 => []),
);

fun tree(Node $n) {
    return bless [
        map {
            my $c = $_;
            tree(first {
                $_->label eq $c
            } @dag)
        } $n->children->@*
    ] => $n->label;
}

tree($dag[0]);
# bless([ #N0
#     bless([ #N1
#         bless([ #N2
#             bless([ #N3
#                 bless([] => 'N4'),
#                 bless([] => 'N5'),
#             ] => 'N3')
#         ] => 'N2')
#     ] => 'N1')
# ] => 'N0')

That was the trivial case.


In my application, I encounter the complication that the DAG contains multiple nodes with the same label.

our @dag = (
    N(N0 => ['N1']),
    N(N1 => ['N2']),
    ︙
    N(N1 => ['N6', 'N5']),
    ︙

Note that this does not mean that there is a multiedge in the proper sense.

This is wrong because now N1 appears to have three equal children.

The N1 nodes must not be collapsed into one node for graph traversal purpose, only for labelling the output tree; so in other words these nodes must be of distinct identity. Let's visualise this with colours.

our @dag = (
    N(N0 => ['N1']),
    N([N1 => 'red'] => ['N2']),
    ︙
    N([N1 => 'blue'] => ['N6', 'N5']),
    ︙

The goal is to reify this DAG as two trees. Follow each of the dotted successor edges in separate passes. I achieve this by remembering the index number of one colour on the node when I pass it, and during next tree construction I pick the next colour in order.

#tree2.pl
#!/usr/bin/env perl
use 5.028; use strictures; use Moops; use Kavorka qw(fun); use List::AllUtils qw(first);
class Node :ro {
    has label => isa => Str;
    has col => isa => Maybe[Str];
    has children => isa => ArrayRef[Str];
    has col_seen => is => 'rw', isa => Int;
}
fun N($c_l, $children) {
    return ref $c_l
        ? Node->new(label => $c_l->[0], col => $c_l->[1], children => $children)
        : Node->new(label => $c_l, children => $children);
}

# indentation outlines desired tree structure
our @dag = (
    ### start 1st tree
    N(N0 => ['N1']),
        N([N1 => 'red'] => ['N2']),
            N(N2 => ['N3']),
                N(N3 => ['N4', 'N5']),
                    N(N4 => []),
                    N(N5 => []),
    ### end 1st tree

    ### start 2nd tree
    # N0
        N([N1 => 'blue'] => ['N6', 'N5']),
            N(N6 => ['N7']),
                N(N7 => ['N4']),
                    # N4
            # N5
    ### end 2nd tree
);

fun tree(Node $n) {
    return bless [
        map {
            my $c = $_;
            my @col = map { $_->col } grep { $_->label eq $c } @dag;
            if (@col > 1) {
                $n->col_seen($n->col_seen + 1);
                die 'exhausted' if $n->col_seen > @col;
                tree(first {
                    $_->label eq $c && $_->col eq $col[$n->col_seen - 1]
                } @dag);
            } else {
                tree(first { $_->label eq $c } @dag);
            }
        } $n->children->@*
    ] => $n->label;
}

tree($dag[0]);
# bless([ #N0
#     bless([ #N1
#         bless([ #N2
#             bless([ #N3
#                 bless([] => 'N4'),
#                 bless([] => 'N5')
#             ] => 'N3')
#         ] => 'N2')
#     ] => 'N1')
# ] => 'N0')

tree($dag[0]);
# bless([ #N0
#     bless([ #N1
#         bless([ #N6
#             bless([ #N7
#                 bless([] => 'N4')
#             ] => 'N7')
#         ] => 'N6'),
#         bless([] => 'N5')
#     ] => 'N1')
# ] => 'N0')

tree($dag[0]);
# exhausted

That code works, I get two trees.


However, there is a problem with my code when I have several of those nodes with coloured successors. Same code as above, just the input is different:

#tree3.pl

︙

our @dag = (
    N(N0 => ['N1']),
        N([N1 => 'red'] => ['N2']),
            N(N2 => ['N3']),
                N(N3 => ['N4', 'N5']),
                    N(N4 => []),
                    N(N5 => []),
    # N0
        N([N1 => 'blue'] => ['N6', 'N5']),
            N(N6 => ['N7']),
                N(N7 => ['N8', 'N4']),
                    N([N8 => 'purple'] => ['N5']),
                        # N5
                    N([N8 => 'orange'] => []),
                    N([N8 => 'cyan'] => ['N5', 'N5']),
                        # N5
                        # N5
                    # N4
            # N5
);

︙

tree($dag[0]);
# bless([ #N0
#     bless([ #N1
#         bless([ #N2
#             bless([ #N3
#                 bless([] => 'N4'),
#                 bless([] => 'N5')
#             ] => 'N3')
#         ] => 'N2')
#     ] => 'N1')
# ] => 'N0')
tree($dag[0]);
# bless([ #N0
#     bless([ #N1
#         bless([ #N6
#             bless([ #N7
#                 bless([ #N8
#                     bless([] => 'N5')
#                 ] => 'N8'),
#                 bless([] => 'N4')
#             ] => 'N7')
#         ] => 'N6'),
#         bless([] => 'N5')
#     ] => 'N1')
# ] => 'N0')
tree($dag[0]);
# exhausted

The problem is that the search exhausts after only two trees, although I should get four:

  • path through red
  • path through blue, then purple
  • path through blue, then orange
  • path through blue, then cyan

You can answer in any programming language.

like image 349
daxim Avatar asked Jan 02 '19 17:01

daxim


1 Answers

Is the following what you were aiming to accomplish? (python 3)

from collections import defaultdict
from itertools import product

class bless:
    def __init__(self, label, children):
        self.label = label
        self.children = children

    def __repr__(self):
        return self.__str__()

    # Just pretty-print stuff
    def __str__(self):
        formatter = "\n{}\n" if self.children else "{}"
        formatted_children = formatter.format(",\n".join(map(str, self.children)))
        return "bless([{}] => '{}')".format(formatted_children, self.label)

class Node:
    def __init__(self, label, children):
        self.label = label
        self.children = children

class DAG:
    def __init__(self, nodes):
        self.nodes = nodes

        # Add the root nodes to a singular, generated root node (for simplicity)
        # This is not necessary to implement the color-separation logic,
        # it simply lessens the number of edge cases I must handle to demonstate
        # the logic. Your existing code will work fine without this "hack"
        non_root = {child for node in self.nodes for child in node.children}
        root_nodes = [node.label for node in self.nodes if node.label not in non_root]
        self.root = Node("", root_nodes)

        # Make a list of all the trees
        self.tree_list = self.make_trees(self.root)

    def tree(self):
        if self.tree_list:
            return self.tree_list.pop(0)
        return list()

    # This is the meat of the program, and is really the logic you are after
    # Its a recursive function that parses the tree top-down from our "made-up"
    # root, and makes <bless>s from the nodes. It returns a list of all separately
    # colored trees, and if prior (recusive) calls already made multiple trees, it
    # will take the cartesian product of each tree per label
    def make_trees(self, parent):
        # A defaultdict is just a hashtable that's empty values
        # default to some data type (list here)
        trees = defaultdict(list)
        # This is some nasty, inefficient means of fetching the children
        # your code already does this more efficiently in perl, and since it
        # contributes nothing to the answer, I'm not wasting time refactoring it
        for node in (node for node in self.nodes if node.label in parent.children):
            # I append the tree(s) found in the child to the list of <label>s trees
            trees[node.label] += self.make_trees(node)
        # This line serves to re-order the trees since the dictionary doesn't preserve
        # ordering, and also restores any duplicated that would be lost
        values = [trees[label] for label in parent.children]
        # I take the cartesian product of all the lists of trees each label
        # is associated with in the dictionary. So if I have
        #    [N1-subtree] [red-N2-subtree, blue-N2-subtree] [N3-subtree]
        # as children of N0, then I'll return:
        # [bless(N0, [N1-st, red-N2-st, N3-st]), bless(N0, [N1-st, blue-N2-st, N3-st])]
        return [bless(parent.label, prod) for prod in product(*values)]

if __name__ == "__main__":
    N0  = Node('N0', ['N1'])
    N1a = Node('N1', ['N2'])
    N2  = Node('N2', ['N3'])
    N3  = Node('N3', ['N4', 'N5'])
    N4  = Node('N4', [])
    N5  = Node('N5', [])

    N1b = Node('N1', ['N6', 'N5'])
    N6  = Node('N6', ['N7'])
    N7  = Node('N7', ['N8', 'N4'])
    N8a = Node('N8', ['N5'])
    N8b = Node('N8', [])
    N8c = Node('N8', ['N5', 'N5'])

    dag = DAG([N0, N1a, N2, N3, N4, N5, N1b, N6, N7, N8a, N8b, N8c])

    print(dag.tree())
    print(dag.tree())
    print(dag.tree())
    print(dag.tree())
    print(dag.tree())
    print(dag.tree())

I explained the logic fairly thoroughly through comments, but just to clarify- I generate all the possible trees at once using a recursive DFS from the root. To ensure there's only one root, I make a "fictional" root that contains all other nodes that do not have a parent, and then start the search on that one node. This is not necessary for the algorithm to work, I just wanted to simplify the logic that didn't directly pertain to your question.

In this DFS, I create a hash table / dictionary of lists per label, and store all the distinct subtrees that can be made from each child in these lists. For most nodes this list will be of length 1 since most nodes will generate a single tree unless their label or a (sub)child has duplicate labels. Regardless, I take the cartesian product of all these lists, and form new bless objects (from each product). I return this list, and the process repeats up the call stack until we finally have our complete list of trees.

All of the printing logic is unnecessary (obviously), but I wanted to make it easier for you to verify if this is indeed the behavior you want. I could not (easily) get it to indent nested blesss, but that should be trivial to manually adjust. The only real part of interest is the make_trees() function, the rest is just setting up things for validation or making the code as easily comparable to your perl code as I could manage.

Formatted output:

bless([
    bless([
        bless([
            bless([
                bless([
                    bless([] => 'N4'),
                    bless([] => 'N5')
                ] => 'N3')
            ] => 'N2')
        ] => 'N1')
    ] => 'N0')
] => '')
bless([
    bless([
        bless([
            bless([
                bless([
                    bless([
                        bless([] => 'N5')
                    ] => 'N8'),
                    bless([] => 'N4')
                ] => 'N7')
            ] => 'N6'),
            bless([] => 'N5')
        ] => 'N1')
    ] => 'N0')
] => '')
bless([
    bless([
        bless([
            bless([
                bless([
                    bless([] => 'N8'),
                    bless([] => 'N4')
                ] => 'N7')
            ] => 'N6'),
            bless([] => 'N5')
        ] => 'N1')
    ] => 'N0')
] => '')
bless([
    bless([
        bless([
            bless([
                bless([
                    bless([
                        bless([] => 'N5'),
                        bless([] => 'N5')
                    ] => 'N8'),
                    bless([] => 'N4')
                ] => 'N7')
            ] => 'N6'),
            bless([] => 'N5')
        ] => 'N1')
    ] => 'N0')
] => '')
[]
[]
like image 91
Dillon Davis Avatar answered Nov 10 '22 20:11

Dillon Davis