I have a web page in which there are some JS APIs that don't alter the dom, but return some numbers. I'd like to write a NodeJS application that downloads such pages and executes those functions in the context of the downloaded page.
I was looking at cheerio for page scraping.. but while I see how easy is it to navigate and manipulate the DOM with it, I don't see any access to running the page functions. Is it possible to do it?
Should I look, instead, at jsdom?
Thanks
Web scraping with JavaScript is a very useful technique to extract data from the Internet for presentation or analysis.
Cheerio loop over elements With each , we can loop over elements. import fetch from 'node-fetch'; import { load } from 'cheerio'; const url = 'http://webcode.me/countries.html' const response = await fetch(url); const body = await response. text(); let $ = load(body); $('tr').
Not surprisingly, some of the most advanced web scraping and browser automation libraries are also written in JavaScript, making it even more attractive for those who want to extract data from the web.
Python is your best bet. Libraries such as requests or HTTPX makes it very easy to scrape websites that don't require JavaScript to work correctly. Python offers a lot of simple-to-use HTTP clients. And once you get the response, it's also very easy to parse the HTML with BeautifulSoup for example.
Sounds like you want to use PhantomJS, which will provide the fully rendered output, and then use cheerio on that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With