Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java- How to get the HTML code from a URL including its AJAX generated code using Firebug or any Java library

I need to obtain the code from a web, which is in part "pure HTML" and in part HTML generated with AJAX, Javascript.

Since the easiest way to get it seems to be using Firebug, i've thought that it has to exist aome way to use Firebug or some plugin of it to be able to do it from Java code.

The problem i have is that after being searching through many webs and portals i have found nothing.

Anyone knows some way/ some plugin... which makes possible get this AJAX generated code mixed with the static HTML, as Firebug does?

Thanks and please excuse my english.

like image 522
Alberto Martín Avatar asked Oct 09 '22 20:10

Alberto Martín


1 Answers

Abhijeet is kinda on the right track, but I'm going to take the time to explain how browsers treat webpages and help you understand why your request is so difficult.

Warning This is quite rough and i am fudging some details for brevity and clarity

A browser connects to a server and uses HTTP to retrieve the page you have requested. When that page is downloaded, the browser then searches for any additional resources referenced in the page and retrieves them. It then executes any javascript it found top-to-bottom, including any referenced scripts. This javascript can manipulate the page, but by this point, the browser doesn't care much about the original source code of the page, having internalized it as a DOM, or Document Object Model. Javascript is actually just manipulating this DOM, since the DOM is just a tree structure, not doing any source-code manipulation. As a result, firebug (or the webkit inspector) don't actually display the source code, they display a representation of the current state of the DOM.

The problem with your request is that you want to use a separate system to load a url then go through the entire process above, unfortunately that would require you to implement an entire javascript engine, in Java.

However, all is not lost. HTMLUnit (mentioned by others), is a working, headless browser, written in Java, and as such you can integrate it into your program. Actually doing that is beyond the scope of this answer, but the homepage is here and the API documentation is here.

like image 154
Aatch Avatar answered Oct 12 '22 11:10

Aatch