Overview
I am trying to create a very basic scraper with PhantomJS and pjscrape framework.
My Code
pjs.config({
timeoutInterval: 6000,
timeoutLimit: 10000,
format: 'csv',
csvFields: ['productTitle','price'],
writer: 'file',
outFile: 'D:\\prod_details.csv'
});
pjs.addSuite({
title: 'ChainReactionCycles Scraper',
url: productURLs, //This is an array of URLs, two example are defined below
scrapers: [
function() {
var results [];
var linkTitle = _pjs.getText('#ModelsDisplayStyle4_LblTitle');
var linkPrice = _pjs.getText('#ModelsDisplayStyle4_LblMinPrice');
results.push([linkTitle[0],linkPrice[0]]);
return results;
}
]
});
URL Array's Used
This first array DOES NOT WORK and fails after the 3rd or 4th URL.
var productURLs = ["8649","17374","7327","7325","14892","8650","8651","14893","18090","51318"];
for(var i=0;i<productURLs.length;++i){
productURLs[i] = 'http://www.chainreactioncycles.com/Models.aspx?ModelID=' + productURLs[i];
}
This second array WORKS and does not fail, even though it is from the same site.
var categoriesURLs = ["304","2420","965","518","514","1667","521","1302","1138","510"];
for(var i=0;i<categoriesURLs.length;++i){
categoriesURLs[i] = 'http://www.chainreactioncycles.com/Categories.aspx?CategoryID=' + categoriesURLs[i];
}
Problem
When iterating through productURLs
the PhantomJS page.open
optional callback automatically assumes failure. Even when the page hasn't finished loading.
I know this as I started the script up while running an HTTP debugger and the HTTP request were still running even after PhantomJS had reported a a page load failure.
However, the code works fine when running with categoriesURLs
.
Assumptions
Possible Solutions
These are solutions I have tried thus far.
page.options.loadImages = false
timeoutInterval
in pjs.config
this was not useful apparently as the error generated was of a page.open
failure and NOT a timeout failure.Any ideas?
The problem was caused by PhantomJS. This has now been resolved.
I now use PhantomJS v2.0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With