Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javascript: How to control flow with async recursive tree traversal?

I need to recurse over a tree to perform operations on specific nodes using async operations. How can I control flow so I have access to the nodes when it's done?

Here's an example situation:

data = {
  name: "deven",
  children: [
    { name: "andrew" },
    { name: "donovan" },
    { name: "james",
      children: [
        { name: "donatello" },
        { name: "dan" }
      ]
    },
    { name: "jimmy",
      children: [
        { name: "mike" },
        { name: "dank" }
      ]
    }
  ]
};

I have a function who's goal it is to iterate through the tree and capitalize all names that start with 'd'. Afterwards, I want to pass the tree into another function to do some more work (possibly delete all nodes that have a name that starts with 'a'), but only after the initial processing has been done:

function capitalize_d(node) {
    if(node.name === "d") {
        node.name = node.name.toUpperCase();
    }

    if(node.children != null) {
        for(var i = 0; i < node.children.length; i++) {
            capitalize_d(node.children[i]);
        }
    }
}

function remove_a(node) {
}

capitalize_d(data);

// Should only get called after all the d's have been capitalized.
remove_a(data);

The above code works fine, because capitalize_d is blocking. If capitalize_d recurses asynchronously, how can we guarantee remove_a is called after it's done? Note the setTimeout call in capitalize_d.

function capitalize_d(node) {
    setTimeout(function() {

        if(node.name === "d") {
            node.name = node.name.toUpperCase();
        }

        if(node.children != null) {
            for(var i = 0; i < node.children.length; i++) {
                capitalize_d(node.children[i]);
            }
        }

    }, 1);
}

function remove_a(node) {
}

capitalize_d(data);

// Should only get called after all the d's have been capitalized.
remove_a(data);

The problem is we have processing for different branches of the tree all getting fired off at the same time, and it's impossible to tell when it's finally done processing the tree.

How can I solve this?

like image 944
agscala Avatar asked Mar 28 '13 15:03

agscala


2 Answers

Let me summarize what I understood of your requirements:

  • you have a data tree where each single node can be altered asynchronously by a set of operations (capitalize_d and remove_a in your example),
  • you want to make sure every single node has been subjected to a given operation before allowing the next one.

I have spent 10 years or more designing real-time embedded software, and believe me, the requirements in this area are meaner and scarrier than anything most network programmers will experience in their entire life. This makes me warn you that you seem to be headed the seriously wrong way here.

As I can imagine it, your problem is to organize a set of individual data in some sort of meaningful structure. Some process collects random bits of information (what you call 'nodes' in your example), and at some point you want to put all those nodes into a consistent, monolithic data structure (a hierarchical tree in your example).

In other words, you have three tasks at hand:

  • a data acquisition process that will collect nodes asynchronously
  • a data production process that will present a consistent, refined data tree
  • a controller process that will synchronize data acquisition and production (possibly directly the user interface if the two above processes are smart enough, but don't count on it too much).

My advice: don't try to do acquisition and production at the same time.

Just to give you an idea of the nightmare you're headed for:

  • depending on how the operations are triggered, there is a possibility that the tree will never be completely processed by a given operation. Let's say the controlling software forgets to call capitalize_d on a few nodes, remove_a will simply never get the green light

  • conversely, if you fire at the tree at random, it is very likely that some nodes will get processed multiple times, unless you keep track of the operation coverage to prevent applying the same transformation twice to a given node

  • if you want remove_a processing to ever start, you might have to prevent the controlling software from sending any more capitalize_d requests, or else the light might stay stuck to red forever. You will end up doing flow control over your requests one way or another (or worse : you will not do any, and your system will be likely to freeze to death should the operation flow wander away from the sweet spot you happened to hit by chance).

  • if an operation alters the structure of the tree (as remove_a obviously does), you have to prevent concurrent accesses. At the very least, you should lock the subtree starting from the node remove_a is working on, or else you would allow processing a subtree that is likely to be asynchronously altered and/or destroyed.

Well it's doable. I've seen fine youg men earning big money doing variations on this theme. They usually spent a couple of evenings each week eating pizzas in front of their computers too, but hey, that's how you can tell hardboiled hackers from the crowd of quiche eaters, right?...

I assume you being posting this question here means you don't really want to do this. Now if you boss does, well, to quote a famous android, I can't lie to you about your chances, but... you have my sympathies.

Now seriously folks.. This is the way I would tackle the problem.

1) take a snapshot of your data at a given point in time

you can weed out raw data using as many criteria as you can (last data acquisition too old, incorrect input, whatever allows you to build the smallest possible tree).

2) build the tree with the snapshot, and then apply whatever capitalize_d, remove_a and camelize_z operations sequentially on this given snapshot.

In parallel, the data acquisition process will continue collecting new nodes or updating existing ones, ready to take the next snapshot.

Besides, you can move some of your processing forward. Obviously capitalize_d does not take any advantage of the tree structure, so you can apply capitalize_d to each node in the snapshot before the tree is even built. You might even be able to apply some of the transformations earlier, i.e. on each collected sample. This can save you a lot of processing time and code complexity.

To end with a bit of theroretical babble,

  • your approach is to consider the data tree a shared object that should support concurrent acces from the data acquisition and data production processes,
  • my approach is to make the data acquisition process serve (asynchronously) consistent sets of data to a data production process, which can then handle sequentially the said set of data.

the data production process could be triggered on demand (say when the end user clicks the "show me something" button), in which case the reactivity would be rather poor: the user would be stuck watching an hourglass or whatever Web2.0 sexy spinning wheel for the time needed to build and process the tree (lets's say 7-8 seconds).

or you could activate the data production process periodically (feed it with a new snapshot every 10 seconds, a period safely above the mean processing time of a data set). The "show me something" button would then just present the last set of completed data. An immediate answer, but with data that could be 10 seconds older than the last recieved samples.

I rarely saw cases where this was not considered acceptable, especially when you produce a bunch of complex data an operator needs a few dozen seconds to digest.

In theory, my approach will lose some reactivity, since the data processed will be slightly outdated, but the concurrent access approach would probably result in an even slower software (and most certainly 5-10 times bigger and buggier).

like image 141
kuroi neko Avatar answered Oct 19 '22 00:10

kuroi neko


I know this post is old, but it came up in search results, and the lone response doesn't provide a working example, so here's a modified version of something I recently did...

function processTree(rootNode, onComplete) {

    // Count of outstanding requests.
    // Upon a return of any request,
    // if this count is zero, we know we're done.
    var outstandingRequests = 0;

    // A list of processed nodes,
    // which is used to handle artifacts
    // of non-tree graphs (cycles, etc).
    // Technically, since we're processing a "tree",
    // this logic isn't needed, and could be
    // completely removed.
    //
    // ... but this also gives us something to inspect
    // in the sample test code. :)
    var processedNodes = [];

    function markRequestStart() {
        outstandingRequests++;
    }

    function markRequestComplete() {
        outstandingRequests--;
        // We're done, let's execute the overall callback
        if (outstandingRequests < 1) { onComplete(processedNodes); }
    }

    function processNode(node) {
        // Kickoff request for this node
        markRequestStart();
        // (We use a regular HTTP GET request as a
        // stand-in for any asynchronous action)
        jQuery.get("/?uid="+node.uid, function(data) {
            processedNodes[node.uid] = data;
        }).fail(function() {
            console.log("Request failed!");
        }).always(function() {
            // When the request returns:
            // 1) Mark it as complete in the ref count
            // 2) Execute the overall callback if the ref count hits zero
            markRequestComplete();
        });

        // Recursively process all child nodes (kicking off requests for each)
        node.children.forEach(function (childNode) {
            // Only process nodes not already processed
            // (only happens for non-tree graphs,
            // which could include cycles or multi-parent nodes)
            if (processedNodes.indexOf(childNode.uid) < 0) {
                processNode(childNode);
            }
        });

    }

    processNode(rootNode);
}

And here's example usage using QUnit:

QUnit.test( "async-example", function( assert ) {
    var done = assert.async();

    var root = {
        uid: "Root",
        children: [{
            uid: "Node A",
            children: [{
                uid: "Node A.A",
                children: []
            }]
        },{
            uid: "Node B",
            children: []
        }]
    };

    processTree(root, function(processedNodes) {
        assert.equal(Object.keys(processedNodes).length, 4);
        assert.ok(processedNodes['Root']);
        assert.ok(processedNodes['Node A']);
        assert.ok(processedNodes['Node A.A']);
        assert.ok(processedNodes['Node B']);
        done();
    });
});
like image 38
Ryan Avatar answered Oct 18 '22 22:10

Ryan