I'm perplexed as to why I'm not able to download the entire contents of some JSON responses from FriendFeed using urllib2. <pre class="prettyprint"><code>>>> import urllib2 >>> stream = urllib2.urlopen('http://friendfeed.com/api/room/the-life-scientists/profile?format=json') >>> stream.headers['content-length'] '168928' >>> data = stream.read() >>> len(data) 61058 >>> # We can see here that I did not retrieve the full JSON ... # given that the stream doesn't end with a closing } ... >>> data[-40:] 'ce2-003048343a40","name":"Vincent Racani' </code></pre> How can I retrieve the full response with urllib2?

Best way to get all of the data: <pre class="prettyprint"><code>fp = urllib2.urlopen("http://www.example.com/index.cfm") response = "" while 1: data = fp.read() if not data: # This might need to be if data == "": -- can't remember break response += data print response </code></pre> The reason is that <code>.read()</code> isn't guaranteed to return the entire response, given the nature of sockets. I thought this was discussed in the documentation (maybe <code>urllib</code>) but I cannot find it.

urllib2 not retrieving entire HTTP response

Tags:

python

http

urllib2

I'm perplexed as to why I'm not able to download the entire contents of some JSON responses from FriendFeed using urllib2.

>>> import urllib2
>>> stream = urllib2.urlopen('http://friendfeed.com/api/room/the-life-scientists/profile?format=json')
>>> stream.headers['content-length']
'168928'
>>> data = stream.read()
>>> len(data)
61058
>>> # We can see here that I did not retrieve the full JSON
... # given that the stream doesn't end with a closing }
... 
>>> data[-40:]
'ce2-003048343a40","name":"Vincent Racani'

How can I retrieve the full response with urllib2?

411

asked Dec 01 '09 04:12

gotgenes

2 Answers

Best way to get all of the data:

fp = urllib2.urlopen("http://www.example.com/index.cfm")

response = ""
while 1:
    data = fp.read()
    if not data:         # This might need to be    if data == "":   -- can't remember
        break
    response += data

print response

The reason is that .read() isn't guaranteed to return the entire response, given the nature of sockets. I thought this was discussed in the documentation (maybe urllib) but I cannot find it.

answered Sep 24 '22 04:09

Jed Smith

Use tcpdump (or something like it) to monitor the actual network interactions - then you can analyze why the site is broken for some client libraries. Ensure that you repeat multiple times by scripting the test, so you can see if the problem is consistent:

import urllib2
url = 'http://friendfeed.com/api/room/friendfeed-feedback/profile?format=json'
stream = urllib2.urlopen(url)
expected = int(stream.headers['content-length'])
data = stream.read()
datalen = len(data)
print expected, datalen, expected == datalen

The site's working consistently for me so I can't give examples of finding failures :)

answered Sep 26 '22 04:09

David Fraser

Related questions
                            
                                A way to quick preview .ipynb files
                            
                                Difference between "as_index = False", and "reset_index()" in pandas groupby
                            
                                arrow in plot matplotlib.pyplot
                            
                                SQLAlchemy update multiple rows in one transaction
                            
                                Python 3 pandas.groupby.filter
                            
                                How can I change the image size of a Plotly saved image?
                            
                                python3 dataclass with **kwargs(asterisk)
                            
                                Numpy in-place operation performance
                            
                                How to improve network graph visualization? [closed]
                            
                                What is the correct way in python to annotate a path with type hints? [duplicate]
                            
                                pandas overwrite values in multiple columns at once based on condition of values in one column
                            
                                Can you have an async handler in Lambda Python 3.6?
                            
                                Postgresql partition and sqlalchemy
                            
                                Python in R - Error: could not find a Python environment for /usr/bin/python
                            
                                (Python: discord.py) ERROR: Could not build wheels for multidict, yarl which use PEP 517 and cannot be installed directly
                            
                                Letting users upload Python scripts for execution
                            
                                How do you do something after you render the view? (Django)
                            
                                Is it a good idea to use super() in Python?
                            
                                Encoding issues with python's etree.tostring
                            
                                How do you flush Python sockets?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With