Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Executing scraped JavaScript with cheerio

I have a web page in which there are some JS APIs that don't alter the dom, but return some numbers. I'd like to write a NodeJS application that downloads such pages and executes those functions in the context of the downloaded page.

I was looking at cheerio for page scraping.. but while I see how easy is it to navigate and manipulate the DOM with it, I don't see any access to running the page functions. Is it possible to do it?

Should I look, instead, at jsdom?

Thanks

like image 205
Tonyx Avatar asked Feb 22 '13 13:02

Tonyx


People also ask

Can I web scrape with JavaScript?

Web scraping with JavaScript is a very useful technique to extract data from the Internet for presentation or analysis.

How do you use Cheerio in react js?

Cheerio loop over elements With each , we can loop over elements. import fetch from 'node-fetch'; import { load } from 'cheerio'; const url = 'http://webcode.me/countries.html' const response = await fetch(url); const body = await response. text(); let $ = load(body); $('tr').

Is JavaScript better for scraping?

Not surprisingly, some of the most advanced web scraping and browser automation libraries are also written in JavaScript, making it even more attractive for those who want to extract data from the web.

Which is better for web scraping JavaScript or Python?

Python is your best bet. Libraries such as requests or HTTPX makes it very easy to scrape websites that don't require JavaScript to work correctly. Python offers a lot of simple-to-use HTTP clients. And once you get the response, it's also very easy to parse the HTML with BeautifulSoup for example.


1 Answers

Sounds like you want to use PhantomJS, which will provide the fully rendered output, and then use cheerio on that.

like image 67
Mark Selby Avatar answered Oct 05 '22 21:10

Mark Selby