Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why does urllib.urlopen(url) fail while urllib2.urlopen(url) works. What specifically about the server response is causing this?

I just want a better idea of what's going on here, I can of course "work around" the problem by using urllib2.

import urllib
import urllib2

url = "http://www.crutchfield.com/S-pqvJFyfA8KG/p_15410415/Dynamat-10415-Xtreme-Speaker-Kit.html"

# urllib2 works fine (foo.headers / foo.read() also behave)
foo = urllib2.urlopen(url)

# urllib throws errors though, what specifically is causing this?
bar = urllib.urlopen(url)

http://pae.st/AxDW/ shows this code in action with the exception/stacktrace. foo.headers and foo.read() work fine

[email protected] ~ $: curl -I "http://www.crutchfield.com/S-pqvJFyfA8KG/p_15410415/Dynamat-10415-Xtreme-Speaker-Kit.html"

HTTP/1.1 302 Object Moved
Cache-Control: private
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
Location: /S-FSTWJcduy5w/p_15410415/Dynamat-10415-Xtreme-Speaker-Kit.html
Server: Microsoft-IIS/7.5
Set-Cookie: SESSIONID=FSTWJcduy5w; domain=.crutchfield.com; expires=Fri, 22-Feb-2013 22:06:43 GMT; path=/
Set-Cookie: SYSTEMID=0; domain=.crutchfield.com; expires=Fri, 22-Feb-2013 22:06:43 GMT; path=/
Set-Cookie: SESSIONDATE=02/23/2012 17:07:00; domain=.crutchfield.com; expires=Fri, 22-Feb-2013 22:06:43 GMT; path=/
X-AspNet-Version: 4.0.30319
HostName: cws105
Date: Thu, 23 Feb 2012 22:06:43 GMT

Thanks.

like image 260
sente Avatar asked Feb 23 '12 22:02

sente


1 Answers

This server is both non-deterministic and sensitive to HTTP version. urllib2 is HTTP/1.1, urllib is HTTP/1.0. You can reproduce this by running curl --http1.0 -I "http://www.crutchfield.com/S-pqvJFyfA8KG/p_15410415/Dynamat-10415-Xtreme-Speaker-Kit.html" a few times in a row. You should see the output curl: (52) Empty reply from server occasionally; that's the error urllib is reporting. (If you re-issue the request a bunch of times with urllib, it should succeed sometimes.)

like image 172
Glyph Avatar answered Oct 10 '22 09:10

Glyph