Why does this url raise BadStatusLine with httplib2 and urllib2?

Tags:

Using httplib2 and urllib2, I'm trying to fetch pages from this url, but all of them didn't work out and ended up with this exception.

content = conn.request(uri="http://www.zdnet.co.kr/news/news_print.asp?artice_id=20110727092902")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 1129, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 901, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 871, in _conn_request
    response = conn.getresponse()
  File "/usr/lib/python2.7/httplib.py", line 1027, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 407, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 371, in _read_status
    raise BadStatusLine(line)

HTTP header was like this

http://www.zdnet.co.kr/news/news_print.asp?artice_id=20110727092902

GET /news/news_print.asp?artice_id=20110727092902 HTTP/1.1
Host: www.zdnet.co.kr
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:10.0.1) Gecko/20100101 Firefox/10.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: ko-kr,ko;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Connection: keep-alive
Cookie: RMID=7d83495d4f336fe0; __utma=37206251.1552605885.1328771258.1328771258.1329070845.2; __utmz=37206251.1328771258.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); ASPSESSIONIDCSQCQTDD=BCLEHPPDEPHEBJDLCFNDMKDN; __utmc=37206251; ASPSESSIONIDSSQCQQCB=MJPLMOJAFPDFCLONCANBIKHN; _EXEN=2
X-FireLogger: 1.2

HTTP/1.1 200 OK
Date: Mon, 13 Feb 2012 18:02:56 GMT
Content-Length: 19158
Content-Type: text/html;charset=UTF-8; Charset=UTF-8
Set-Cookie: ASPSESSIONIDSQSDQRDB=NGAIFHKAGDIOGEMANAOLLKKF; path=/
Cache-Control: private

Any clue?

290

asked Feb 13 '12 18:02

goodhyun

2 Answers

This works fine for me:

import urllib2

opener = urllib2.build_opener()

headers = {
  'User-Agent': 'Mozilla/5.0 (Windows NT 5.1; rv:10.0.1) Gecko/20100101 Firefox/10.0.1',
}

opener.addheaders = headers.items()
response = opener.open("http://www.zdnet.co.kr/news/news_print.asp?artice_id=20110727092902")

print response.headers
print response.read()

The website discards all requests that occur without a User-Agent string.

112

answered Sep 18 '22 15:09

Blender

For all the people that end up here with a similar problem after installing httplib2 0.8:

Version 0.8 has a regression with connection handling in relation with HTTP keep-alive. See the bug report: https://code.google.com/p/httplib2/issues/detail?id=250

There is a fix for this issue, but it has not been released so far. Until then just use httplib2 0.7.7.

answered Sep 22 '22 15:09

smlz

Related questions
                            
                                How do I Index PDF files and search for keywords?
                            
                                Python Strange Error: "TypeError: 'NoneType' object is not callable"
                            
                                How to flush cache for socket.gethostbyname response?
                            
                                Using a class instance as a class attribute, descriptors, and properties
                            
                                Python - Multiprocessing.processes become copies of the main process when run from executable
                            
                                Parallel recursive function in Python
                            
                                Python: Conditional variables based on whether nosetest is running
                            
                                Mark block based on indentation level in Vim
                            
                                How can I change the resolution of a raster using GDAL?
                            
                                How can I print and display subprocess stdout and stderr output without distortion?
                            
                                Are there any working examples of Zolera SOAP Infrastructure (ZSI)? [closed]
                            
                                couchdb-python change notifications
                            
                                Which Python to use on Windows for Numpy and friends?
                            
                                PySerial not talking to Arduino
                            
                                How do I autosize text in matplotlib python?
                            
                                pip install requirement fails
                            
                                share data using Manager() in python multiprocessing module
                            
                                Intercept operator lookup on metaclass
                            
                                Python digest/hash for string similarity
                            
                                Highlighting python code blocks in vim

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does this url raise BadStatusLine with httplib2 and urllib2?

Tags:

python

urllib2

httplib2

goodhyun

People also ask

2 Answers

Blender

smlz

Recent Activity

Donate For Us