I'm trying to write an application that scrapes a meteor webpage. This is rather difficult as meteor webpages render initially entirely as Javascript. Is there some way perhaps to render the page with some sort of scraper?
Probably going to do it with node, if that helps.
Thanks
You could use phantomjs to render the webpage. This is an example, specifically designed for meteor webpages, (from spiderable) to capture their HTML:
var fs = require('fs');
var child_process = require('child_process');
console.log('Loading a web page');
var page = require('webpage').create();
page.open("http://localhost:3000", function(status) {
});
var i = 0;
setInterval(function() {
var ready = page.evaluate(function () {
if (typeof Meteor !== 'undefined'
&& typeof(Meteor.status) !== 'undefined'
&& Meteor.status().connected) {
Deps.flush();
return DDP._allSubscriptionsReady();
}
return false;
});
console.log("Ready", ready);
if (ready) {
var out = page.content;
console.log(out);
phantom.exit();
}
}, 100);
It is this way but you could wrap the output and capture it using require('child_process').exec and stdin.
You can run the code with phantomjs script.js and it would give you back the HTML of a meteor page.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With