Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

urllib.request: any way to read from it without modifying the request object?

Tags:

python

urllib

Given a standard urllib.request object, retrieved so:

req = urllib.urlopen('http://example.com')

If I read its contents via req.read(), afterwards the request object will be empty.

Unlike normal file-like objects, however, the request object does not have a seek method, for I am sure are excellent reasons.

However, in my case I have a function, and I want it to make certain determinations about a request and then return that request "unharmed" so that it can be read again.

I understand that one option is to re-request it. But I'd like to be able to avoid making multiple HTTP requests for the same url & content.

The only other alternative I can think of is to have the function return a tuple of the extracted content and the request object, with the understanding that anything that calls this function will have to get the content in this way.

Is that my only option?

like image 220
Jordan Reiter Avatar asked Apr 17 '13 18:04

Jordan Reiter


People also ask

What is Urllib request request?

The urllib.request module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic and digest authentication, redirections, cookies and more. See also. The Requests package is recommended for a higher-level HTTP client interface.

Is Urllib request the same as request?

Requests - Requests' is a simple, easy-to-use HTTP library written in Python. 1) Python Requests encodes the parameters automatically so you just pass them as simple arguments, unlike in the case of urllib, where you need to use the method urllib. encode() to encode the parameters before passing them.

What is the protocol used and the use of Urllib request?

Urllib package is the URL handling module for python. It is used to fetch URLs (Uniform Resource Locators). It uses the urlopen function and is able to fetch URLs using a variety of different protocols.

Is requests faster than Urllib?

I found that time took to send the data from the client to the server took same time for both modules (urllib, requests) but the time it took to return data from the server to the client is more then twice faster in urllib compare to request. I'm working on localhost.


2 Answers

Delegate the caching to a StringIO object(code not tested, just to give the idea):

import urllib
from io import StringIO


class CachedRequest(object):
    def __init__(self, url):
        self._request = urllib.urlopen(url)
        self._content = None

    def __getattr__(self, attr):
        # if attr is not defined in CachedRequest, then get it from
        # the request object.
        return getattr(self._request, attr)

    def read(self):
        if self._content is None:
            content = self._request.read()
            self._content = StringIO()
            self._content.write(content)
            self._content.seek(0)
            return content
        else:
            return self._content.read()

    def seek(self, i):
        self._content.seek(i)

If the code actually expects a real Request object(i.e. calls isinstance to check the type) then subclass Request and you don't even have to implement __getattr__.

Note that it is possible that a function checks for the exact class(and in this case you can't do nothing) or, if it's written in C, calls the method using C/API calls(in which case the overridden method wont be called).

like image 59
Bakuriu Avatar answered Oct 24 '22 02:10

Bakuriu


Make a subclass of urllib2.Request that uses a cStringIO.StringIO to hold whatever gets read. Then you can implement seek and so forth. Actually you could just use a string, but that'd be more work.

like image 27
kindall Avatar answered Oct 24 '22 02:10

kindall