How can one scrape a web page with jQuery and XPath?

Question

I can stick a jQuery javascript link in the header of a web page via Firebug. Then, I can run a script to scrape it and the pages it links to.

How do I begin writing this script in jQuery or javascript in general? Is there an interface in either jQuery/Javascript with which I can use XPath to access the elements on a page (and on the pages it links to)?

JP Richardson · Accepted Answer

First, you'll need a JavaScript runtime outside of the browser. The most common is Node.js. Next you'll need a way to create the DOM client-side. This is typically done using jsdom.

So, your script should:

download the html page (jsdom does this for you, but you can use request)
create a client-side DOM
parse using jQuery

Here is a sample Node.js script:

var jsdom = require("jsdom");

jsdom.env("http://nodejs.org/dist/", [
    'http://code.jquery.com/jquery-1.5.min.js'
  ], function(errors, window) {
  console.log("there have been", window.$("a").length, "nodejs releases!");
});

You would run it, like so:

$ node scrape.js

Don't forget to install jsdom first:

$ npm install --production jsdom

How can one scrape a web page with jQuery and XPath?

Tags:

javascript

jquery

web-scraping

xpath

dangerChihuahua007

1 Answers

JP Richardson

Recent Activity

Donate For Us

How can one scrape a web page with jQuery and XPath?

Tags:

javascript

jquery

web-scraping

xpath

dangerChihuahua007

1 Answers

JP Richardson

Related questions

Recent Activity

Donate For Us