Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does urllib2 in Python 2.6.1 support proxy via https

Does urllib2 in Python 2.6.1 support proxy via https?

I've found the following at http://www.voidspace.org.uk/python/articles/urllib2.shtml:

NOTE

Currently urllib2 does not support fetching of https locations through a proxy. This can be a problem.

I'm trying automate login in to web site and downloading document, I have valid username/password.

proxy_info = {
    'host':"axxx", # commented out the real data
    'port':"1234"  # commented out the real data
}

proxy_handler = urllib2.ProxyHandler(
                 {"http" : "http://%(host)s:%(port)s" % proxy_info})
opener = urllib2.build_opener(proxy_handler,
         urllib2.HTTPHandler(debuglevel=1),urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)

fullurl = 'https://correct.url.to.login.page.com/user=a&pswd=b' # example
req1 = urllib2.Request(url=fullurl, headers=headers)
response = urllib2.urlopen(req1)

I've had it working for similar pages but not using HTTPS and I suspect it does not get through proxy - it just gets stuck in the same way as when I did not specify proxy. I need to go out through proxy.

I need to authenticate but not using basic authentication, will urllib2 figure out authentication when going via https site (I supply username/password to site via url)?

EDIT: Nope, I tested with

   proxies = {
        "http" : "http://%(host)s:%(port)s" % proxy_info,
        "https" : "https://%(host)s:%(port)s" % proxy_info
    }

    proxy_handler = urllib2.ProxyHandler(proxies)

And I get error:

urllib2.URLError: urlopen error [Errno 8] _ssl.c:480: EOF occurred in violation of protocol

like image 548
stefanB Avatar asked Jun 23 '09 00:06

stefanB


2 Answers

Fixed in Python 2.6.3 and several other branches:

  • _bugs.python.org/issue1424152 (replace _ with http...)
  • http://www.python.org/download/releases/2.6.3/NEWS.txt

    Issue #1424152: Fix for httplib, urllib2 to support SSL while working through proxy. Original patch by Christopher Li, changes made by Senthil Kumaran.

like image 58
Gringo Suave Avatar answered Sep 18 '22 22:09

Gringo Suave


I'm not sure Michael Foord's article, that you quote, is updated to Python 2.6.1 -- why not give it a try? Instead of telling ProxyHandler that the proxy is only good for http, as you're doing now, register it for https, too (of course you should format it into a variable just once before you call ProxyHandler and just repeatedly use that variable in the dict): that may or may not work, but, you're not even trying, and that's sure not to work!-)

like image 26
Alex Martelli Avatar answered Sep 17 '22 22:09

Alex Martelli