I am trying to scrape a website but I don't get some of the elements, because these elements are dynamically created.
I use the cheerio in node.js and My code is below.
var request = require('request'); var cheerio = require('cheerio'); var url = "http://www.bdtong.co.kr/index.php?c_category=C02"; request(url, function (err, res, html) { var $ = cheerio.load(html); $('.listMain > li').each(function () { console.log($(this).find('a').attr('href')); }); });
This code returns empty response, because when the page is loaded, the <ul id="store_list" class="listMain">
is empty.
The content has not been appended yet.
How can I get these elements using node.js? How can I scrape pages with dynamic content?
Web scraping is the process of extracting data from a website in an automated way and Node. js can be used for web scraping. Even though other languages and frameworks are more popular for web scraping, Node. js can be utilized well to do the job too.
Getting Started. In this part, after installation scrapy, you have a chose a local in your computer for creating a project Scrapy, and open the terminal and write the command scrapy startproject [name of project], which creating project scrapy. After creating the path of the project, they are necessary to enter it.
Here you go;
var phantom = require('phantom'); phantom.create(function (ph) { ph.createPage(function (page) { var url = "http://www.bdtong.co.kr/index.php?c_category=C02"; page.open(url, function() { page.includeJs("http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function() { page.evaluate(function() { $('.listMain > li').each(function () { console.log($(this).find('a').attr('href')); }); }, function(){ ph.exit() }); }); }); }); });
Check out GoogleChrome/puppeteer
Headless Chrome Node API
It makes scraping pretty trivial. The following example will scrape the headline over at npmjs.com (assuming .npm-expansions
remains)
const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://www.npmjs.com/'); const textContent = await page.evaluate(() => { return document.querySelector('.npm-expansions').textContent }); console.log(textContent); /* No Problem Mate */ browser.close(); })();
evaluate
will allow for the inspection of the dynamic element as this will run scripts on the page.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With