When web scraping with Node.js, can I run all JavaScripts on the page? (i.e., simulate a real browser?)

Tags:

I'm trying to do some web scraping with node.js. Using jsdom, it is easy to load up the DOM and inject JavaScript into it. I want to go one step further: run all JavaScript linked to from the web page and then inspect the resulting DOM, including visual properties (height, width, etc) of elements.

Thus far, I get NaN when I try to inspect the dimensions of DOM elements with jsdom.

Is this possible?

It strikes me that there are two distinct challenges:

Running all the JS on the web page
Getting Node to simulate the window/screen rendering in addition to just the DOM

Another way to ask the question: is it possible to use node.js as a completely headless browser that you can script?

If this isn't possible, does anyone have suggestions for what library I can use to do this? I'm relatively language agnostic.

373

asked Oct 20 '11 21:10

Ted Benson

1 Answers

Take a look at PhantomJS. Incredibly simple to use.

http://www.phantomjs.org/

PhantomJS is a command-line tool that packs and embeds WebKit. Literally it acts like any other WebKit-based web browser, except that nothing gets displayed to the screen (thus, the term headless). In addition to that, PhantomJS can be controlled or scripted using its JavaScript API.

168

answered Nov 09 '22 02:11

Dal Hundal

Related questions
                            
                                NodeJs performance problem
                            
                                Sencha application in a Facebook iFrame gets a "Cannot POST /"
                            
                                when to check for file size/mimetype in node.js upload script?
                            
                                Node.js https.request with keep-alive header
                            
                                Can binding out-of scope variables speed up your code?
                            
                                Design pattern suggestions for syncing multiple-user data in an online game in real-time
                            
                                Newbie questions about partials
                            
                                NPM Keeps Having Issues Finding My Node Path
                            
                                How to create a cache in node.js that handles explicitly for simultaneous duplicate requests for a CPU-bound operation
                            
                                Performing authorized (through facebook) REST requests to my node.js server on a PhoneGap app
                            
                                Are there any frameworks for doing realtime models in node.js?
                            
                                Nodejs and Streams - A detailed overview?
                            
                                Mongoose.js swallows errors in save callback?
                            
                                Session Cookies Only for Specific Routes
                            
                                How do I create a MySQL connection pool while working with NodeJS and Express?
                            
                                Hard downsides of long polling?
                            
                                Mongoose: what's the differences between Model.create and Collection.insert
                            
                                Mongoose upsert does not create default schema property
                            
                                MissingSchemaError: Schema hasn't been registered for model
                            
                                Change Window.print() paper orientation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

When web scraping with Node.js, can I run all JavaScripts on the page? (i.e., simulate a real browser?)

Tags:

node.js

screen-scraping

Ted Benson

People also ask

1 Answers

Dal Hundal

Recent Activity

Donate For Us