Web crawler that can interpret JavaScript [closed]

Question

I want to write a web crawler that can interpret JavaScript. Basically its a program in Java or PHP that takes a URL as input and outputs the DOM tree which is similar to the output in Firebug HTML window. The best example is Kayak.com where you can not see the resulting DOM displayed on the browser when you 'view source' but can save the resulting HTML though Firebug.

How would I go about doing this? What tools exist that would help me?

tokland · Accepted Answer

Ruby's Capybara is an integration test library, but it can also be used to write stand-alone web-crawlers. Given that it uses backends like Selenium or headless WebKit, it interprets javascript out-of-the-box:

require 'capybara/dsl'
require 'capybara-webkit'

include Capybara::DSL
Capybara.current_driver = :webkit
Capybara.app_host = "http://www.google.com"
page.visit("/")
puts(page.html)

Web crawler that can interpret JavaScript [closed]

Tags:

javascript

web-crawler

user320662

1 Answers

tokland

Recent Activity

Donate For Us

Web crawler that can interpret JavaScript [closed]

Tags:

javascript

web-crawler

user320662

1 Answers

tokland

Related questions

Recent Activity

Donate For Us