How can I scrape pages with dynamic content using node.js?

Tags:

I am trying to scrape a website but I don't get some of the elements, because these elements are dynamically created.

I use the cheerio in node.js and My code is below.

var request = require('request'); var cheerio = require('cheerio'); var url = "http://www.bdtong.co.kr/index.php?c_category=C02";  request(url, function (err, res, html) {     var $ = cheerio.load(html);     $('.listMain > li').each(function () {         console.log($(this).find('a').attr('href'));     }); });

This code returns empty response, because when the page is loaded, the <ul id="store_list" class="listMain"> is empty.

The content has not been appended yet.

How can I get these elements using node.js? How can I scrape pages with dynamic content?

973

asked Feb 26 '15 09:02

JayD

2 Answers

Here you go;

var phantom = require('phantom');  phantom.create(function (ph) {   ph.createPage(function (page) {     var url = "http://www.bdtong.co.kr/index.php?c_category=C02";     page.open(url, function() {       page.includeJs("http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function() {         page.evaluate(function() {           $('.listMain > li').each(function () {             console.log($(this).find('a').attr('href'));           });         }, function(){           ph.exit()         });       });     });   }); });

135

answered Sep 19 '22 13:09

Safi

Check out GoogleChrome/puppeteer

Headless Chrome Node API

It makes scraping pretty trivial. The following example will scrape the headline over at npmjs.com (assuming .npm-expansions remains)

const puppeteer = require('puppeteer');  (async () => {   const browser = await puppeteer.launch();   const page = await browser.newPage();    await page.goto('https://www.npmjs.com/');    const textContent = await page.evaluate(() => {     return document.querySelector('.npm-expansions').textContent   });    console.log(textContent); /* No Problem Mate */    browser.close(); })();

evaluate will allow for the inspection of the dynamic element as this will run scripts on the page.

answered Sep 22 '22 13:09

scniro

Related questions
                            
                                How do you remove an event listener that uses "this" in TypeScript?
                            
                                Why move your Javascript files to a different main domain that you also own?
                            
                                Set a callback function to a new window in javascript
                            
                                Select first and last element with particular class using jQuery
                            
                                TinyMCE width and height disobedient!
                            
                                nodejs fs.exists()
                            
                                HTML how to clear input using javascript?
                            
                                Set content-type on blob
                            
                                Ruby on Rails 4: How to include Javascript files in Rails web application?
                            
                                How to dynamically change CSS class of an HTML tag?
                            
                                Difference running Protractor with/without Selenium?
                            
                                How can I pass the FormGroup of a parent component to its child component using the current Form API
                            
                                OnKeyUp JavaScript Time Delay?
                            
                                JavaScript: Two separate scripts - share variables?
                            
                                jQuery or Javascript - how to disable window scroll without overflow:hidden;
                            
                                JSBin: How to get ES6/Babel tab?
                            
                                Javascript to check whether a checkbox is being checked or unchecked
                            
                                how do I zoom a background image on a div with background-size
                            
                                npm install error from the terminal
                            
                                How can I modify the XMLHttpRequest responsetext received by another function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I scrape pages with dynamic content using node.js?

Tags:

javascript

node.js

phantomjs

web-crawler

JayD

People also ask

2 Answers

Safi

scniro

Recent Activity

Donate For Us