urllib.request: any way to read from it without modifying the request object?

Tags:

urllib

Given a standard urllib.request object, retrieved so:

req = urllib.urlopen('http://example.com')

If I read its contents via req.read(), afterwards the request object will be empty.

Unlike normal file-like objects, however, the request object does not have a seek method, for I am sure are excellent reasons.

However, in my case I have a function, and I want it to make certain determinations about a request and then return that request "unharmed" so that it can be read again.

I understand that one option is to re-request it. But I'd like to be able to avoid making multiple HTTP requests for the same url & content.

The only other alternative I can think of is to have the function return a tuple of the extracted content and the request object, with the understanding that anything that calls this function will have to get the content in this way.

Is that my only option?

220

asked Apr 17 '13 18:04

Jordan Reiter

2 Answers

Delegate the caching to a StringIO object(code not tested, just to give the idea):

import urllib
from io import StringIO


class CachedRequest(object):
    def __init__(self, url):
        self._request = urllib.urlopen(url)
        self._content = None

    def __getattr__(self, attr):
        # if attr is not defined in CachedRequest, then get it from
        # the request object.
        return getattr(self._request, attr)

    def read(self):
        if self._content is None:
            content = self._request.read()
            self._content = StringIO()
            self._content.write(content)
            self._content.seek(0)
            return content
        else:
            return self._content.read()

    def seek(self, i):
        self._content.seek(i)

If the code actually expects a real Request object(i.e. calls isinstance to check the type) then subclass Request and you don't even have to implement __getattr__.

Note that it is possible that a function checks for the exact class(and in this case you can't do nothing) or, if it's written in C, calls the method using C/API calls(in which case the overridden method wont be called).

answered Oct 24 '22 02:10

Bakuriu

Make a subclass of urllib2.Request that uses a cStringIO.StringIO to hold whatever gets read. Then you can implement seek and so forth. Actually you could just use a string, but that'd be more work.

answered Oct 24 '22 02:10

kindall

Related questions
                            
                                Starting http-server on Amazon EC2
                            
                                Forcing unique together with model inheritance
                            
                                matplotlib/pandas: put line label along the plotted lines in time series plot
                            
                                python: OpenCV Root Directory
                            
                                Maximum recursion depth reached faster when using functools.lru_cache
                            
                                Python being auto-aliased in Mac OS X Lion
                            
                                How to make Django management command not open a transaction?
                            
                                How to trouble-shoot HDFStore Exception: cannot find the correct atom type
                            
                                Statsmodel using ARMA
                            
                                How can I convert text to speech (mp3 file) in python?
                            
                                Apply PMML predictor model in python
                            
                                cython - distutils vs cmake: linking against libpython?
                            
                                How to create bi-directional messaging using AMP in Twisted/Python
                            
                                Python - HTTP multipart/form-data POST request
                            
                                Python argparse: combine multiple-value argument with default and const
                            
                                mayavi - setting the [x,y,z] extent of an image programatically
                            
                                (python) Should my variable be local or global? (best practice)
                            
                                In python, what are the pros and cons of importing a class vs. importing the class's module?
                            
                                How to preserve column names while importing data using numpy?
                            
                                How to create mock LDAP server for Django project?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With