I need to recurse over a tree to perform operations on specific nodes using async operations. How can I control flow so I have access to the nodes when it's done?
Here's an example situation:
data = {
name: "deven",
children: [
{ name: "andrew" },
{ name: "donovan" },
{ name: "james",
children: [
{ name: "donatello" },
{ name: "dan" }
]
},
{ name: "jimmy",
children: [
{ name: "mike" },
{ name: "dank" }
]
}
]
};
I have a function who's goal it is to iterate through the tree and capitalize all names that start with 'd'. Afterwards, I want to pass the tree into another function to do some more work (possibly delete all nodes that have a name that starts with 'a'), but only after the initial processing has been done:
function capitalize_d(node) {
if(node.name === "d") {
node.name = node.name.toUpperCase();
}
if(node.children != null) {
for(var i = 0; i < node.children.length; i++) {
capitalize_d(node.children[i]);
}
}
}
function remove_a(node) {
}
capitalize_d(data);
// Should only get called after all the d's have been capitalized.
remove_a(data);
The above code works fine, because capitalize_d is blocking. If capitalize_d
recurses asynchronously, how can we guarantee remove_a
is called after it's done? Note the setTimeout
call in capitalize_d
.
function capitalize_d(node) {
setTimeout(function() {
if(node.name === "d") {
node.name = node.name.toUpperCase();
}
if(node.children != null) {
for(var i = 0; i < node.children.length; i++) {
capitalize_d(node.children[i]);
}
}
}, 1);
}
function remove_a(node) {
}
capitalize_d(data);
// Should only get called after all the d's have been capitalized.
remove_a(data);
The problem is we have processing for different branches of the tree all getting fired off at the same time, and it's impossible to tell when it's finally done processing the tree.
How can I solve this?
Let me summarize what I understood of your requirements:
capitalize_d
and remove_a
in your example),I have spent 10 years or more designing real-time embedded software, and believe me, the requirements in this area are meaner and scarrier than anything most network programmers will experience in their entire life. This makes me warn you that you seem to be headed the seriously wrong way here.
As I can imagine it, your problem is to organize a set of individual data in some sort of meaningful structure. Some process collects random bits of information (what you call 'nodes' in your example), and at some point you want to put all those nodes into a consistent, monolithic data structure (a hierarchical tree in your example).
In other words, you have three tasks at hand:
My advice: don't try to do acquisition and production at the same time.
Just to give you an idea of the nightmare you're headed for:
depending on how the operations are triggered, there is a possibility that the tree will never be completely processed by a given operation. Let's say the controlling software forgets to call capitalize_d
on a few nodes, remove_a
will simply never get the green light
conversely, if you fire at the tree at random, it is very likely that some nodes will get processed multiple times, unless you keep track of the operation coverage to prevent applying the same transformation twice to a given node
if you want remove_a
processing to ever start, you might have to prevent the controlling software from sending any more capitalize_d
requests, or else the light might stay stuck to red forever. You will end up doing flow control over your requests one way or another (or worse : you will not do any, and your system will be likely to freeze to death should the operation flow wander away from the sweet spot you happened to hit by chance).
if an operation alters the structure of the tree (as remove_a
obviously does), you have to prevent concurrent accesses. At the very least, you should lock the subtree starting from the node remove_a
is working on, or else you would allow processing a subtree that is likely to be asynchronously altered and/or destroyed.
Well it's doable. I've seen fine youg men earning big money doing variations on this theme. They usually spent a couple of evenings each week eating pizzas in front of their computers too, but hey, that's how you can tell hardboiled hackers from the crowd of quiche eaters, right?...
I assume you being posting this question here means you don't really want to do this. Now if you boss does, well, to quote a famous android, I can't lie to you about your chances, but... you have my sympathies.
Now seriously folks.. This is the way I would tackle the problem.
1) take a snapshot of your data at a given point in time
you can weed out raw data using as many criteria as you can (last data acquisition too old, incorrect input, whatever allows you to build the smallest possible tree).
2) build the tree with the snapshot, and then apply whatever capitalize_d, remove_a and camelize_z operations sequentially on this given snapshot.
In parallel, the data acquisition process will continue collecting new nodes or updating existing ones, ready to take the next snapshot.
Besides, you can move some of your processing forward. Obviously capitalize_d
does not take any advantage of the tree structure, so you can apply capitalize_d
to each node in the snapshot before the tree is even built. You might even be able to apply some of the transformations earlier, i.e. on each collected sample.
This can save you a lot of processing time and code complexity.
To end with a bit of theroretical babble,
the data production process could be triggered on demand (say when the end user clicks the "show me something" button), in which case the reactivity would be rather poor: the user would be stuck watching an hourglass or whatever Web2.0 sexy spinning wheel for the time needed to build and process the tree (lets's say 7-8 seconds).
or you could activate the data production process periodically (feed it with a new snapshot every 10 seconds, a period safely above the mean processing time of a data set). The "show me something" button would then just present the last set of completed data. An immediate answer, but with data that could be 10 seconds older than the last recieved samples.
I rarely saw cases where this was not considered acceptable, especially when you produce a bunch of complex data an operator needs a few dozen seconds to digest.
In theory, my approach will lose some reactivity, since the data processed will be slightly outdated, but the concurrent access approach would probably result in an even slower software (and most certainly 5-10 times bigger and buggier).
I know this post is old, but it came up in search results, and the lone response doesn't provide a working example, so here's a modified version of something I recently did...
function processTree(rootNode, onComplete) {
// Count of outstanding requests.
// Upon a return of any request,
// if this count is zero, we know we're done.
var outstandingRequests = 0;
// A list of processed nodes,
// which is used to handle artifacts
// of non-tree graphs (cycles, etc).
// Technically, since we're processing a "tree",
// this logic isn't needed, and could be
// completely removed.
//
// ... but this also gives us something to inspect
// in the sample test code. :)
var processedNodes = [];
function markRequestStart() {
outstandingRequests++;
}
function markRequestComplete() {
outstandingRequests--;
// We're done, let's execute the overall callback
if (outstandingRequests < 1) { onComplete(processedNodes); }
}
function processNode(node) {
// Kickoff request for this node
markRequestStart();
// (We use a regular HTTP GET request as a
// stand-in for any asynchronous action)
jQuery.get("/?uid="+node.uid, function(data) {
processedNodes[node.uid] = data;
}).fail(function() {
console.log("Request failed!");
}).always(function() {
// When the request returns:
// 1) Mark it as complete in the ref count
// 2) Execute the overall callback if the ref count hits zero
markRequestComplete();
});
// Recursively process all child nodes (kicking off requests for each)
node.children.forEach(function (childNode) {
// Only process nodes not already processed
// (only happens for non-tree graphs,
// which could include cycles or multi-parent nodes)
if (processedNodes.indexOf(childNode.uid) < 0) {
processNode(childNode);
}
});
}
processNode(rootNode);
}
And here's example usage using QUnit:
QUnit.test( "async-example", function( assert ) {
var done = assert.async();
var root = {
uid: "Root",
children: [{
uid: "Node A",
children: [{
uid: "Node A.A",
children: []
}]
},{
uid: "Node B",
children: []
}]
};
processTree(root, function(processedNodes) {
assert.equal(Object.keys(processedNodes).length, 4);
assert.ok(processedNodes['Root']);
assert.ok(processedNodes['Node A']);
assert.ok(processedNodes['Node A.A']);
assert.ok(processedNodes['Node B']);
done();
});
});
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With