I'm having something of a hard time determining what is asynchronous and what is not while running CasperJS, what must be wrapped in then() statements, and what is going to be evaluated when.
I'll run into a problem somewhere that has to do with a fall-through break statement, variable scope, or the evaluate() statement, and I'll start wrapping all my code in then() statements... which turns out to not be the problem.
I notice that my code runs on two levels when I step through it, an evaluation level that parses the code, and then come the then() statements. Also, my print statements appear in a sometimes inexplicable order.
My question: how do these then() statements actually get queued? I've read the docs, and I sort of understand. I want to understand the rules and have some cut and dried ways to determine what is sync and what is async.
I've even read parts of a book on async coding, but nothing really seems to address CasperJS structure specifically. Any resources?
Also, what's best practice for where to put your then() statements? Should they be peppered liberally throughout, or should they be in the controlling main casper.begin() function that calls the others?
Thanks folks, I'm used to PHP.
Rule of thumb: All CasperJS functions which contain the words then
and wait
are asynchronous. This statement has many exceptions.
then()
doing?CasperJS is organized as a series of steps that handle the control flow of your script. then()
handles the many PhantomJS/SlimerJS event types that define the ending of a step. When then()
is called, the passed function is put into a step queue which is a simply JavaScript array. If the previous step finished, either because it was a simple synchronous function or because CasperJS detected that specific events where triggered, the next step will began execution and repeat this until all steps are executed.
All those step functions are bound to the casper
object, so you can refer to that object using this
.
The following simple script shows two steps:
casper.start("http://example.com", function(){
this.echo(this.getTitle());
}).run();
The first step is an implicit asynchronous ("stepped") open()
call behind start()
. The start()
function also takes an optional callback which itself is the second step in this script.
During the execution of the first step the page is opened. When the page is completely loaded, PhantomJS triggers the onLoadFinished
event, CasperJS triggers its own events and continues with the next step. The second step is a simple completely synchronous function, so nothing fancy is happening here. When this is done, CasperJS exits, because there are no more steps to execute.
There is an exception to this rule: When a function is passed into the run()
function, it will be executed as the last step instead of the default exit. If you don't call exit()
or die()
in there, you will need to kill the process.
then()
detect that the next step has to wait?Take for example the following example:
casper.then(function(){
this.echo(this.getTitle());
this.fill(...)
this.click("#search");
}).then(function(){
this.echo(this.getTitle());
});
If during a step execution an event is triggered that denotes the loading of a new page, then CasperJS will wait for the page load until executing the next step. In this case a click was triggered which itself triggered a onNavigationRequested
event from the underlying browser. CasperJS sees this and suspends execution using callbacks until the next page is loaded. Other types of such triggers may be form submissions or even when the client JavaScript does something like its own redirect with window.open()
/window.location
.
Of course, this breaks down when we are talking about single page applications (with a static URL). PhantomJS cannot detect that for example a different template is being rendered after a click and therefore cannot wait until it is finished loading (this can take some time when data is loaded from the server). If the following steps depend on the new page, you will need to use e.g. waitUntilVisible()
to look for a selector that is unique to the page to be loaded.
Some people call it Promises, because of the way steps can be chained. Aside from the name (then()
) and an action chain, that's the end of the similarities. There is no result that is passed from callback to callback through the step chain in CasperJS. Either you store your result in a global variable or add it to the casper
object. Then there is only a limited error handling. When an error is encountered CasperJS will die in the default configuration.
I prefer to call it a Builder pattern, because the execution starts as soon as you call run()
and every call before is only there to put steps into the queue (see 1st question). That is why it doesn't make sense to write synchronous functions outside of step functions. Simply put, they are executed without any context. The page didn't even began loading.
Of course this is not the whole truth by calling it a builder pattern. Steps can be nested which actually means that if you schedule a step inside of another step, it will be put into the queue after the current step and after all the other steps that where already scheduled from the current step. (That's a lot of steps!)
The following script is a good illustration of what I mean:
casper.on("load.finished", function(){
this.echo("1 -> 3");
});
casper.on("load.started", function(){
this.echo("2 -> 2");
});
casper.start('http://example.com/');
casper.echo("3 -> 1");
casper.then(function() {
this.echo("4 -> 4");
this.then(function() {
this.echo("5 -> 6");
this.then(function() {
this.echo("6 -> 8");
});
this.echo("7 -> 7");
});
this.echo("8 -> 5");
});
casper.then(function() {
this.echo("9 -> 9");
});
casper.run();
The first number shows the position of the synchronous code snippet in the script and the second one shows the actual executed/printed position, because echo()
is synchronous.
Important points:
To avoid confusion and hard to find problems, always call asynchronous functions after the synchronous functions in a single step. If it seems impossible, split into multiple steps or consider recursion.
waitFor()
work?waitFor()
is the most flexible function in the wait*
family, because every other function uses this one.
waitFor()
schedules in its most basic form (passing only one check function and nothing else) one step. The check
function that is passed into it, is called repeatedly until the condition is met or the (global) timeout is reached. When a then
and/or onTimeout
step function is passed additionally, it will be called in those cases.
It is important to note that if waitFor()
times out, the script will stop execution when you didn't pass in the onTimeout
callback function which is essentially an error catch function:
casper.start().waitFor(function checkCb(){
return false;
}, function thenCb(){
this.echo("inner then");
}, null, 1000).then(function() {
this.echo("outer");
}).run();
As of 1.1-beta3 there are the following additional asynchronous functions that don't follow the rule of thumb:
Casper module: back()
, forward()
, reload()
, repeat()
, start()
, withFrame()
, withPopup()
Tester module: begin()
If you're not sure look into the source code whether a specific function uses then()
or wait()
.
Event listeners can be registered using casper.on(listenerName, callback)
and they will be triggered using casper.emit(listenerName, values)
. As far as the internals of CasperJS are concerned, they are not asychronous. The asynchronous handling comes from the functions where those emit()
calls lie. CasperJS passes most PhantomJS events simply through, so this is where those are asynchronous.
The control or execution flow is the way CasperJS executes the script. When we break out of the control flow, we need to manage a second flow (or even more). This will complicate the development and maintainability of the script immensely.
As example, you want to call an asynchronous function that is defined somewhere. Let's assume that there is no way to rewrite the function in such a way, that it is synchronous.
function longRunningFunction(callback) {
...
callback(data);
...
}
var result;
casper.start(url, function(){
longRunningFunction(function(data){
result = data;
});
}).then(function(){
this.open(urlDependsOnFunResult???);
}).then(function(){
// do something with the dynamically opened page
}).run();
Now we have two flows which depend on one another.
Other ways to directly split the flow is by using the JavaScript functions setTimeout()
and setInterval()
. Since CasperJS provides waitFor()
, there is no need to use those.
When a control flow must be merged back into the CasperJS flow there is an obvious solution by setting a global variable and concurrently waiting for it to be set.
Example is the same as in the previous question:
var result;
casper.start(url, function(){
longRunningFunction(function(data){
result = data;
});
}).waitFor(function check(){
return result; // `undefined` is evaluated to `false`
}, function then(){
this.open(result.url);
}, null, 20000).then(function(){
// do something with the dynamically opened page
}).run();
Technically, nothing is asynchronous in the tester module. Calling test.begin()
simply executes the callback. Only when the callback itself uses asynchronous code (meaning test.done()
is called asynchronously inside a single begin()
callback), the other begin()
test cases can be added to the test case queue.
That is why a single test case usually consists of a complete navigation with casper.start()
and casper.run()
and not the other way around:
casper.test.begin("description", function(test){
casper.start("http://example.com").run(function(){
test.assert(this.exists("a"), "At least one link exists");
test.done();
});
});
It's best to stick to nesting a complete flow inside of begin()
, since the start()
and run()
calls won't be mixed between multiple flows. This enables you to use multiple complete test cases per file.
Notes:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With