I'm perplexed as to why I'm not able to download the entire contents of some JSON responses from FriendFeed using urllib2.
>>> import urllib2
>>> stream = urllib2.urlopen('http://friendfeed.com/api/room/the-life-scientists/profile?format=json')
>>> stream.headers['content-length']
'168928'
>>> data = stream.read()
>>> len(data)
61058
>>> # We can see here that I did not retrieve the full JSON
... # given that the stream doesn't end with a closing }
...
>>> data[-40:]
'ce2-003048343a40","name":"Vincent Racani'
How can I retrieve the full response with urllib2?
urllib2 is deprecated in python 3. x. use urllib instaed.
1) urllib2 can accept a Request object to set the headers for a URL request, urllib accepts only a URL. 2) urllib provides the urlencode method which is used for the generation of GET query strings, urllib2 doesn't have such a function. This is one of the reasons why urllib is often used along with urllib2.
request is a Python module for fetching URLs (Uniform Resource Locators). It offers a very simple interface, in the form of the urlopen function. This is capable of fetching URLs using a variety of different protocols.
I found that time took to send the data from the client to the server took same time for both modules (urllib, requests) but the time it took to return data from the server to the client is more then twice faster in urllib compare to request. I'm working on localhost.
Best way to get all of the data:
fp = urllib2.urlopen("http://www.example.com/index.cfm")
response = ""
while 1:
data = fp.read()
if not data: # This might need to be if data == "": -- can't remember
break
response += data
print response
The reason is that .read()
isn't guaranteed to return the entire response, given the nature of sockets. I thought this was discussed in the documentation (maybe urllib
) but I cannot find it.
Use tcpdump (or something like it) to monitor the actual network interactions - then you can analyze why the site is broken for some client libraries. Ensure that you repeat multiple times by scripting the test, so you can see if the problem is consistent:
import urllib2
url = 'http://friendfeed.com/api/room/friendfeed-feedback/profile?format=json'
stream = urllib2.urlopen(url)
expected = int(stream.headers['content-length'])
data = stream.read()
datalen = len(data)
print expected, datalen, expected == datalen
The site's working consistently for me so I can't give examples of finding failures :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With