Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mechanize not working for automating gmail login in Google Appengine

I have used mechanize and deployed an app on GAE and it works fine. But, for an app that I am making, I am trying to automate login to gmail through mechanize. It doesn't work in the development environment on local machine as well as after deploying on appengine.

I have been able to use the same script to run it on my server through mod_python using PSP.

I found a lot of solutions here, but none of them seem to work for me. Here is a snippet of my code:

<snip>
br = mechanize.Browser()
response = br.open("http://www.gmail.com")
loginForm = br.forms().next()
loginForm["Email"] = self.request.get('user')
loginForm["Passwd"] = self.request.get('password')
response = br.open(loginForm.click())
response2 = br.open("http://mail.google.com/mail/h/")
result = response2.read()
<snip>

When I look at the result, all I get is the login page when used with appengine. But with mod_python hosted on my own server, I get the page with the user's inbox.

like image 603
ssandeep Avatar asked Apr 12 '11 10:04

ssandeep


1 Answers

The problem is most likely due to how Google crippled the urllib2 module on GAE.

Internally it now uses the urlfetch module (which is something that Google wrote) and they have completely removed the HTTPCookieProcessor() functionality - meaning, cookies are NOT persisted from request to request which is the critical piece when automatically logging into sites programmatically.

There is a way around this, but not using mechanize. You have to roll your own Cookie processor - here is the basic approach I took (not perfect, but it gets the job done):

import urllib, urllib2, Cookie
from google.appengine.api import urlfetch
from urlparse import urljoin
import logging

class GAEOpener(object):
    def __init__(self):
        self.cookie = Cookie.SimpleCookie()
        self.last_response = None

    def open(self, url, data = None):
        base_url = url
        if data is None:
            method = urlfetch.GET
        else:
            method = urlfetch.POST
        while url is not None:
            self.last_response = urlfetch.fetch(url = url,
                payload = data,
                method = method,
                headers = self._get_headers(self.cookie),
                allow_truncated = False,
                follow_redirects = False,
                deadline = 10
                )
            data = None # Next request will be a get, so no need to send the data again. 
            method = urlfetch.GET
            self.cookie.load(self.last_response.headers.get('set-cookie', '')) # Load the cookies from the response
            url = urljoin(base_url, self.last_response.headers.get('location'))
            if url == base_url:
                url = None
        return self.last_response

    def _get_headers(self, cookie):
        headers = {
            'Host' : '<ENTER HOST NAME HERE>',
            'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)',
            'Cookie' : self._make_cookie_header(cookie)
             }
        return headers

    def _make_cookie_header(self, cookie):
        cookie_header = ""
        for value in cookie.values():
            cookie_header += "%s=%s; " % (value.key, value.value)
        return cookie_header

    def get_cookie_header(self):
        return self._make_cookie_header(self.cookie)

You can use it like you would urllib2.urlopen, except the method you would use is just "open".

like image 132
Scott Avatar answered Sep 18 '22 08:09

Scott