Given a standard urllib.request
object, retrieved so:
req = urllib.urlopen('http://example.com')
If I read its contents via req.read()
, afterwards the request object will be empty.
Unlike normal file-like objects, however, the request object does not have a seek
method, for I am sure are excellent reasons.
However, in my case I have a function, and I want it to make certain determinations about a request and then return that request "unharmed" so that it can be read again.
I understand that one option is to re-request it. But I'd like to be able to avoid making multiple HTTP requests for the same url & content.
The only other alternative I can think of is to have the function return a tuple of the extracted content and the request object, with the understanding that anything that calls this function will have to get the content in this way.
Is that my only option?
The urllib.request module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic and digest authentication, redirections, cookies and more. See also. The Requests package is recommended for a higher-level HTTP client interface.
Requests - Requests' is a simple, easy-to-use HTTP library written in Python. 1) Python Requests encodes the parameters automatically so you just pass them as simple arguments, unlike in the case of urllib, where you need to use the method urllib. encode() to encode the parameters before passing them.
Urllib package is the URL handling module for python. It is used to fetch URLs (Uniform Resource Locators). It uses the urlopen function and is able to fetch URLs using a variety of different protocols.
I found that time took to send the data from the client to the server took same time for both modules (urllib, requests) but the time it took to return data from the server to the client is more then twice faster in urllib compare to request. I'm working on localhost.
Delegate the caching to a StringIO
object(code not tested, just to give the idea):
import urllib
from io import StringIO
class CachedRequest(object):
def __init__(self, url):
self._request = urllib.urlopen(url)
self._content = None
def __getattr__(self, attr):
# if attr is not defined in CachedRequest, then get it from
# the request object.
return getattr(self._request, attr)
def read(self):
if self._content is None:
content = self._request.read()
self._content = StringIO()
self._content.write(content)
self._content.seek(0)
return content
else:
return self._content.read()
def seek(self, i):
self._content.seek(i)
If the code actually expects a real Request
object(i.e. calls isinstance
to check the type) then subclass Request
and you don't even have to implement __getattr__
.
Note that it is possible that a function checks for the exact class(and in this case you can't do nothing) or, if it's written in C, calls the method using C/API calls(in which case the overridden method wont be called).
Make a subclass of urllib2.Request
that uses a cStringIO.StringIO
to hold whatever gets read. Then you can implement seek
and so forth. Actually you could just use a string, but that'd be more work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With