I need to parse a simple web page and get data from html, such as "src", "data-attr", etc. How can I do this most efficiently using Node.js? If it helps, I'm using Node.js 0.8.x.
P.S. This is the site I'm parsing. I want to get a list of current tracks and make my own html5 app for listen on mobile devices.
Web scraping is the process of extracting data from a website in an automated way and Node. js can be used for web scraping. Even though other languages and frameworks are more popular for web scraping, Node. js can be utilized well to do the job too.
Caching is one of the common ways of improving the Node Js performance. Caching can be done for both client-side and server-side web applications. However, server-side caching is the most preferred choice for Node Js performance optimization because it has JavaScript, CSS sheets, HTML pages, etc.
I have done this a lot. You'll want to use PhantomJS if the website that you're scraping is heavily using JavaScript. Note that PhantomJS is not Node.js. It's a completely different JavaScript runtime. You can integrate through phantomjs-node or node-phantom, but they are both kinda hacky. YMMV with those. Avoid anything to do with jsdom. It'll cause you headaches - this includes Zombie.js.
What you should use is Cheerio in conjunction with Request. This will be sufficient for most web pages.
I wrote a blog post on using Cheerio with Request: Quick and Dirty Screen Scraping with Node.js But, again, if it's JavaScript intensive, use PhantomJS in conjunction with CasperJS.
Hope this helps.
Snippet using Request and Cheerio:
var request = require('request') , cheerio = require('cheerio'); var searchTerm = 'screen+scraping'; var url = 'http://www.bing.com/search?q=' + searchTerm; request(url, function(err, resp, body){ $ = cheerio.load(body); links = $('.sb_tlst h3 a'); //use your CSS selector here $(links).each(function(i, link){ console.log($(link).text() + ':\n ' + $(link).attr('href')); }); });
You could try PhantomJS. Here's the documentation for using it for screen scraping.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With