Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there command line or library tools for rendering webpages that use JavaScript?

Page-scraping on the Internet has seem to have hit somewhat of a wall for me, as there are more and more sites that are dependent on JavaScript for rendering portions of the screen.

It seems to me that with so many open source layout and JavaScript renderers released (like WebKit, Gecko and Chromium + V8) that someone must have made a tool for downloading a page and rendering its JavaScript without having to run an actual browser. However, I'm not turning up what I'm looking for with my searches - I've found tools like Selenium-rc, but they depend on a running browser. I'm interested in any tool or library which can do one (or both) of the following:

  1. A program that can be run from the command line (*nix) which, given the source of a page, returns the page's source as rendered by some JS engine.

  2. Integrated support in a particular language that allows one to (easily) pass the source of a page to it and returns the page's source as rendered by some JS engine.

I think #1 is preferable in a general sense, but #2 would be more useful if the tool exists in the language I want to work in. Also, I'm not concerned with the particular JS engine - any relatively modern one will do. What is out there?

like image 970
Dan Lew Avatar asked Apr 07 '09 19:04

Dan Lew


People also ask

How does JavaScript render in browser?

When the browser reads HTML code, whenever it encounters an HTML element like html , body , div etc., it creates a JavaScript object called a Node. Eventually, all HTML elements will be converted to JavaScript objects.

What is JavaScript rendering?

Javascript uses the document object model (DOM) to manipulate the DOM elements. Rendering refers to showing the output in the browser. The DOM establishes parent-child relationships, and adjacent sibling relationships, among the various elements in the HTML file.


2 Answers

web kit html to pdf works perfect, it can even produce jpg

http://wkhtmltopdf.googlecode.com

like image 51
h4ck3rm1k3 Avatar answered Oct 01 '22 18:10

h4ck3rm1k3


You can look at HTMLUnit. It's main purpose is automatic web testing, but I think it may let you get the rendered page.

like image 37
Sergey Avatar answered Oct 01 '22 19:10

Sergey