Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting HTML from web pages that use AJAX

Tags:

I wanted to know how to scrape web pages that use AJAX to fetch content on the web page being rendered. Typically a HTTP GET for such pages will just fetch the HTML page with the JavaScript code embedded in it. But I want to know if it is possible to programmatically (preferably Java) query for such pages and simulate a web browser kind of a request so that I get the HTML content resulting after the AJAX calls.

like image 606
Aayush Puri Avatar asked Feb 08 '10 11:02

Aayush Puri


1 Answers

In The Productive Programmer author Neal Ford suggests that the functional testing tool Selenium can be used for non-testing tasks. Your task of inspecting HTML after client side DOM manipulation has taken place falls into this category. Selenium even allows you to automate interactions with the browser so if you need some buttons clicked to fire some AJAX events, you can script it. Selenium works by using a browser plugin and a java based server. Selenium test code (or non-test code in your case) can be written in a variety of languages including java, C# and other .Net languages, php, perl, python and ruby.

like image 56
Asaph Avatar answered Oct 12 '22 21:10

Asaph