I need to download the following webpage: http://m.10bet.com/#leage_panel#10096
It is a sportsbetting page and I need the quotes. So, in the first place this seems pretty simple. However, here is what happens (you can check this with eg. developer tools of your browser):
Instead, I will need to use a headless browser capable of evaluating javascript. HtmlUnit for java is inadequate since it does not offer robust javascript functionality. Therefore PhantomJS in combination with CasperJS is my current choice. I apply CasperJS with the following script:
var casper = require('casper').create();
casper.start('http://m.10bet.com/#leage_panel#10096', function() {
var url = 'http://m.10bet.com/#leage_panel#10096';
this.download(url, '10bet.html');
});
casper.run(function() {
this.echo('Done.').exit();
});
However, this script does not load the complete page. Just the inital page. How do I load the complete webpage as it is presented in the browser?
That script looks like a good start, but as soon as your (HTML) page loads, the (CasperJS) script stops, because you have not given it any more instructions. The crudest way to fix this would be to go to sleep for a couple of seconds, then scrape the page:
var casper = require('casper').create();
var fs=require('fs');
casper.start('http://m.10bet.com/#leage_panel#10096', function() {
this.wait(2000, function() {
fs.write("10bet.html", this.getHTML() );
});
});
casper.run();
A 2000ms time-out is crude for a couple of reasons:
So it is better to identify something on the page that you want and need to be there, and then use one of Casper's waitForXXX()
functions. See the API docs starting here: http://casperjs.readthedocs.org/en/latest/modules/casper.html#waitfor
As another point, I'm guessing you don't actually want the whole HTML page, just the data in it. getHTML()
takes a parameter to filter what is received. E.g. in your case getHTML('#league_block')
might be much more useful. Again, see the API docs for more ideas.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With