Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

post a form using jsdom and node.js

I am using jsdom, jquery and node.js to scrape websites. Is there any way I can post a form and get the resulting next page window using jsdom.

Here is the code

var httpAgent = require('http-agent'),
    jsdom = require('jsdom'),
    request = require('request');

request({uri:'http://www.orbitz.com'}, function(error, response, body){
  if(error && response.statusCode != 200)
    console.log('Error on request');

  jsdom.env({
    html: body,
      scripts : [
        'http://code.jquery.com/jquery-1.5.min.js'
      ]
    }, function(err, window) {
          var $ = window.jQuery;

          $('#airOneWay').attr('checked', true);
          $('#airRoundTrip').removeAttr('checked');
          $('#airOrigin').val('ATL');
          $('#airDestination').val('CHI');

          // here we need to submit the form $('#airbotForm') and get the resulting window
          //console.log($('#airbotForm').html());
   });
});

This is the form which needs to be submitted $('#airbotForm') and the resulting page has to be captured.

Can anybody help? Thanks

like image 457
Madhusudhan Avatar asked Jun 07 '11 09:06

Madhusudhan


1 Answers

Oh man. This is where we get into crazy land.

As it stands, the key difference between jsdom and "the browser" is we can access the window externally. For instance in your example you set $ to window.$, which is basically saying "hey, for this current window I want a reference to the jquery object". You could have 10's of windows, and hold references to all of their $'s.

Now, lets say you load a new page due to a form submission/link click...

JSDOM would need to reload the window and update the javascript context (potentially injecting the scripts you provided in the original jsdom.env call). Unfortunately, the reference(s) you held from the last window would be gone/overwritten. In other words, calling $(...) after the page had reloaded would result in unexpected behavior (most likely a memory leak or selection of dom elements on the previous page)

How do you get around this?

Since you are using jquery already, do something like..

var form   = $('#htlbotForm');
var data   = form.serialize();
var url    = form.attr('action') || 'get';
var type   = form.attr('enctype') || 'application/x-www-form-urlencoded';
var method = form.attr('method');

request({
  url    : url,
  method : method.toUpperCase(),
  body   : data,
  headers : {
    'Content-type' : type
  }
},function(error, response, body) {
  // this assumes no error for brevity.
  var newDoc = jsdom.env(body, [/* scripts */], function(errors, window) {
    // do your post processing
  });
});

YMMV, but this approach should work in non-ajax situations.

like image 176
tmpvar Avatar answered Sep 27 '22 17:09

tmpvar