Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to download a csv file after login by using Casperjs

I want to donwload a csv file by using Caperjs. This is what I wrote:

var login_id = "my_user_id";
var login_password = "my_password";

var casper = require('casper').create();

casper.userAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36 ');

casper.start("http://eoddata.com/symbols.aspx",function(){
    this.evaluate(function(id,password) {
        document.getElementById('tl00_cph1_ls1_txtEmail').value = id;
        document.getElementById('ctl00_cph1_ls1_txtPassword').value = password;
        document.getElementById('ctl00_cph1_ls1_btnLogin').submit();

    }, login_id, login_password);
});

casper.then(function(){
    this.wait(3000, function() {
        this.echo("Wating...");
    });
});

casper.then(function(){
    this.download("http://eoddata.com/Data/symbollist.aspx?e=NYSE","nyse.csv");
});

casper.run();

And I got nyse.csv, but the file was a HTML file for registration of the web site.

It seems login process fails. How can I login correctly and save the csv file?

2015/05/13

Following @Darren's help, I wrote like this:

casper.start("http://eoddata.com/symbols.aspx");
casper.waitForSelector("form input[name = ctl00$cph1$ls1$txtEmail ]", function() {
  this.fillSelectors('form', {
    'input[name = ctl00$cph1$ls1$txtEmail ]' : login_id,
    'input[name = ctl00$cph1$ls1$txtPassword ]' : login_password,
    }, true);
});

And this code ends up with error Wait timeout of 5000ms expired, exiting.. As far as I understand the error means that the CSS selector couldn't find the element. How can I find a way to fix this problem?

Update at 2015/05/18

I wrote like this:

casper.waitForSelector("form input[name = ctl00$cph1$ls1$txtEmail]", function() {
    this.fillSelectors('form', {
        'input[name = ctl00$cph1$ls1$txtEmail]' : login_id,
        'input[name = ctl00$cph1$ls1$txtPassword]' : login_password,
    }, true);
}, function() {
    fs.write("timeout.html", this.getHTML(), "w");
    casper.capture("timeout.png");
});

I checked timeout.html by Chrome Developer tools and Firebugs, and I confirmed several times that there is the input element.

<input name="ctl00$cph1$ls1$txtEmail" id="ctl00_cph1_ls1_txtEmail" style="width:140px;" type="text">

How can I fix this problem? I already spent several hours for this issue.

Update 2015/05/19

Thanks for Darren, Urarist and Artjom I could remove the time out error, but there is still another error.

Downloaded CSV file was still registration html file, so I rewrote the code like this to find out the cause of error:

casper.waitForSelector("form input[name ='ctl00$cph1$ls1$txtEmail']", function() {
    this.fillSelectors('form', {
        "input[name ='ctl00$cph1$ls1$txtEmail']" : login_id,
        "input[name ='ctl00$cph1$ls1$txtPassword']" : login_password,
    }, true);
});/*, function() {
    fs.write("timeout.html", this.getHTML(), "w");
    casper.capture("timeout.png");
});*/

casper.then(function(){
    fs.write("logined.html", this.getHTML(), "w");
});

In the logined.html user email was filled correctly, but password is not filled. Is there anyone who have guess for the cause of this?

like image 513
ironsand Avatar asked May 11 '15 06:05

ironsand


2 Answers

The trick is to successfully log in. There are multiple ways to login. I've tried some and the only one that works on this page is by triggering the form submission using the enter key. This is done by using the PhantomJS page.sendEvent() function. The fields can be filled using casper.sendKeys().

var casper = require('casper').create();

casper.userAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36 ');

casper.start("http://eoddata.com/symbols.aspx",function(){
    this.sendKeys("#ctl00_cph1_ls1_txtEmail", login_id);
    this.sendKeys("#ctl00_cph1_ls1_txtPassword", login_password, {keepFocus: true});
    this.page.sendEvent("keypress", this.page.event.key.Enter);
});

casper.waitForUrl(/myaccount/, function(){
    this.download("http://eoddata.com/Data/symbollist.aspx?e=NYSE", "nyse.csv");
});

casper.run();

It seems that it is necessary to wait for that specific page. CasperJS doesn't notice that a new page was requested and the then() functionality is not used for some reason.

Other ways that I tried were:

  • Filling and submitting the form with casper.fillSelectors()
  • Filling through the DOM with casper.evaluate() and submitting by clicking on the login button with casper.click()
  • Mixing all of the above.
like image 137
Artjom B. Avatar answered Oct 13 '22 15:10

Artjom B.


At first glance your script looks reasonable. But there are a couple of ways to make it simpler, which should also make it more robust.

First, instead of your evaluate() line,

this.fillSelectors('form', {
  'input[name = id ]' : login_id,
  'input[name = pw ]' : login_password,
  }, true);

The true parameter means submit it. (I guessed the form names, but I'm fairly sure you could continue to use CSS IDs if you prefer.)

But, even better is to not fill the form until you are sure it is there:

casper.waitForSelector("form input[name = id ]", function() {
  this.fillSelectors('form', {
    'input[name = id ]' : login_id,
    'input[name = pw ]' : login_password,
    }, true);
});

This would be important if the login form is being dynamically placed there by JavaScript (possibly even from an Ajax call), so won't exist on the page as soon as the page is loaded.

The other change is instead of using casper.wait(), to use one of the casper.waitForXXX() to make sure the csv file link is there before you try to download it. Waiting 3 seconds will go wrong if the remote server takes more than 3.1 seconds to respond, and wastes time if the remote server only takes 1 second to respond.

UPDATE: When you get a time-out on the waitFor lines it tells you the root of your problem is you are using a selector that is not there. This, I find, is the biggest time-consumer when writing Casper scripts. (I recently envisaged a tool that could automate trying to find a near-miss, but couldn't get anyone else interested, and it is a bit too big a project for one person.) So your troubleshooting start points will be:

  • Add an error handler to the timing-out waitFor() command and take a screenshot (casper.capture()).
  • Dump the HTML. If you know the ID of a parent div, you could give that, to narrow down how much you have to look for.
  • Open the page with FireBug (or tool of your choice) and poke around to find what is there. (remember you can type a jQuery command, or document.querySelector() command, in the console, which is a good way to interactively find the correct selector.)
  • Try with SlimerJS, instead of PhantomJS (especially if still using PhantomJS 1.x). It might be that the site uses some feature that is only supported in newer browsers.
like image 25
Darren Cook Avatar answered Oct 13 '22 16:10

Darren Cook