Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using a Ruby script to login to a website via https

Alright, so here's the dealio: I'm working on a Ruby app that'll take data from a website, and aggregate that data into an XML file.

The website I need to take data from does not have any APIs I can make use of, so the only thing I can think of is to login to the website, sequentially load the pages that have the data I need (in this case, PMs; I want to archive them), and then parse the returned HTML.

The problem, though, is that I don't know of any ways to programatically simulate a login session.

Would anyone have any advice, or know of any proven methods that I could use to successfully login to an https page, and then programatically load pages from the site using a temporary cookie session from the login? It doesn't have to be a Ruby-only solution -- I just wanna know how I can actually do this. And if it helps, the website in question is one that uses Microsoft's .NET Passport service as its login/session mechanism.

Any input on the matter is welcome. Thanks.

like image 772
Bapabooiee Avatar asked Nov 14 '09 09:11

Bapabooiee


1 Answers

Mechanize

Mechanize is ruby library which imititates the behaviour of a web browser. You can click links, fill out forms und submit them. It even has a history and remebers cookies. It seems your problem could be easily solved with the help of mechanize.

The following example is taken from http://docs.seattlerb.org/mechanize/EXAMPLES_rdoc.html:

require 'rubygems'
require 'mechanize'

a = Mechanize.new
a.get('http://rubyforge.org/') do |page|
  # Click the login link
  login_page = a.click(page.link_with(:text => /Log In/))

  # Submit the login form
  my_page = login_page.form_with(:action => '/account/login.php') do |f|
    f.form_loginname  = ARGV[0]
    f.form_pw         = ARGV[1]
  end.click_button

  my_page.links.each do |link|
    text = link.text.strip
    next unless text.length > 0
    puts text
  end
end
like image 116
johannes Avatar answered Sep 23 '22 13:09

johannes