I can stick a jQuery javascript link in the header of a web page via Firebug. Then, I can run a script to scrape it and the pages it links to.
How do I begin writing this script in jQuery or javascript in general? Is there an interface in either jQuery/Javascript with which I can use XPath to access the elements on a page (and on the pages it links to)?
First, you'll need a JavaScript runtime outside of the browser. The most common is Node.js. Next you'll need a way to create the DOM client-side. This is typically done using jsdom.
So, your script should:
jsdom
does this for you, but you can use request)Here is a sample Node.js script:
var jsdom = require("jsdom");
jsdom.env("http://nodejs.org/dist/", [
'http://code.jquery.com/jquery-1.5.min.js'
], function(errors, window) {
console.log("there have been", window.$("a").length, "nodejs releases!");
});
You would run it, like so:
$ node scrape.js
Don't forget to install jsdom
first:
$ npm install --production jsdom
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With