Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I collect data from a website that uses AJAX, with Perl?

This might seem a bit backwards, but I want to use Perl (and Curl if possible) to get data from a site that is using Ajax to fill an HTML shell with information. How do I make these Javascript calls to get the data I need?

The website is here: http://www.jigsaw.com/showContactUpdateTab.xhtml?companyId=224230

like image 336
VolatileRig Avatar asked Aug 22 '11 21:08

VolatileRig


People also ask

How AJAX get data from server?

The $.get() method requests data from the server with an HTTP GET request. Syntax: $.get(URL,callback); The required URL parameter specifies the URL you wish to request.

Can AJAX send data to server?

AJAX stands for Asynchronous JavaScript And XML. In a nutshell, it is the use of the XMLHttpRequest object to communicate with servers. It can send and receive information in various formats, including JSON, XML, HTML, and text files.


2 Answers

Remember that AJAX calls are ordinary HTTP requests, so you always should be able to perform them.

Open Firebug or Web Inspector on the website you're talking about, you'll see some XHR calls:

XHR finished loading: "http://www.jigsaw.com/dwr/interface/UserActionAPI.js". "http://www.jigsaw.com/dwr/call/plaincall/UserActionAPI.getMostPurchasedContacts.dwr". "http://www.jigsaw.com/dwr/call/plaincall/UserActionAPI.getRecentlyGraveyardedContacts.dwr "http://www.jigsaw.com/dwr/call/plaincall/UserActionAPI.getRecentlyAddedContacts.dwr". "http://www.jigsaw.com/dwr/call/plaincall/UserActionAPI.getRecentlyTitleChangedContacts.dwr"

Yay! Now you know where to get that data. Their scripts use POST HTTP request to the URLs above, so if you open them in your browser, you'll see various engine errors.

When you sniff (via Web Inspector debugger, for example) their AJAX POST requests, you'll see the next body:

"callCount=1 page=/showContactUpdateTab.xhtml?companyId=224230 httpSessionId=F5E7EC4A45DFCE87B969A9F4FA06C361 scriptSessionId=D020EFF4333283B907402687182D03E034 c0-scriptName=UserActionAPI c0-methodName=getRecentlyGraveyardedContacts c0-id=0 c0-param0=number:224230 c0-param1=boolean:false c0-param2=boolean:false batchId=1 "

I'm pretty sure, they're generating a bunch of security session IDs to avoid data miners. You may need to dive into their JavaScript codes to learn more about those generators.

like image 108
Daniel O'Hara Avatar answered Oct 26 '22 11:10

Daniel O'Hara


Some applications have code in place to check that the client is a real AJAX client. They simply the check for the presence of the header X-Requested-With: XMLHttpRequest. So it's easy to circumvent:

curl -H 'X-Requested-With: XMLHttpRequest' ...

use HTTP::Request::Common;
GET $url, 'X-Requested-With' => 'XMLHttpRequest', ...

Of course, you might have to deal with the usual stuff, like required cookies (for the session), nonce parameters, the occasional complexity. Firebug or the like for other browsers will help you reverse-engineer the required headers and parameters.

like image 28
Lumi Avatar answered Oct 26 '22 13:10

Lumi