Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Web Scrape Meteor Pages

I'm trying to write an application that scrapes a meteor webpage. This is rather difficult as meteor webpages render initially entirely as Javascript. Is there some way perhaps to render the page with some sort of scraper?

Probably going to do it with node, if that helps.

Thanks

like image 768
mjkaufer Avatar asked Apr 08 '26 22:04

mjkaufer


1 Answers

You could use phantomjs to render the webpage. This is an example, specifically designed for meteor webpages, (from spiderable) to capture their HTML:

var fs = require('fs');
var child_process = require('child_process');

console.log('Loading a web page');

var page = require('webpage').create();

page.open("http://localhost:3000", function(status) {

});

var i = 0;

setInterval(function() {
     var ready = page.evaluate(function () {
          if (typeof Meteor !== 'undefined' 
               && typeof(Meteor.status) !== 'undefined' 
               && Meteor.status().connected) {
               Deps.flush();
               return DDP._allSubscriptionsReady();
          }
          return false;
     });

     console.log("Ready", ready);

     if (ready) {
          var out = page.content;
          console.log(out);
          phantom.exit();
     }
}, 100);

It is this way but you could wrap the output and capture it using require('child_process').exec and stdin.

You can run the code with phantomjs script.js and it would give you back the HTML of a meteor page.

like image 133
Tarang Avatar answered Apr 11 '26 16:04

Tarang



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!