I'm trying to do some web scraping with node.js. Using jsdom
, it is easy to load up the DOM and inject JavaScript into it. I want to go one step further: run all JavaScript linked to from the web page and then inspect the resulting DOM, including visual properties (height, width, etc) of elements.
Thus far, I get NaN
when I try to inspect the dimensions of DOM elements with jsdom.
Is this possible?
It strikes me that there are two distinct challenges:
Another way to ask the question: is it possible to use node.js as a completely headless browser that you can script?
If this isn't possible, does anyone have suggestions for what library I can use to do this? I'm relatively language agnostic.
Web scraping is the process of extracting data from a website in an automated way and Node. js can be used for web scraping. Even though other languages and frameworks are more popular for web scraping, Node. js can be utilized well to do the job too.
Thanks to some creative engineers, it is now feasible to use Node. js modules in browsers, but not directly. Being able to call Node. js modules from JavaScript running in the browser has many advantages because it allows you to use Node.
Both the browser and Node. js use JavaScript as their programming language. Building apps that run in the browser is a completely different thing than building a Node. js application.
Take a look at PhantomJS. Incredibly simple to use.
http://www.phantomjs.org/
PhantomJS is a command-line tool that packs and embeds WebKit. Literally it acts like any other WebKit-based web browser, except that nothing gets displayed to the screen (thus, the term headless). In addition to that, PhantomJS can be controlled or scripted using its JavaScript API.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With