Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caching in urllib2?

Is there an easy way to cache things when using urllib2 that I am over-looking, or do I have to roll my own?

like image 761
Yuvi Avatar asked Sep 29 '08 14:09

Yuvi


People also ask

What does urllib2 Urlopen do?

What is urlopen? urllib2 offers a very simple interface, in the form of the urlopen function. Just pass the URL to urlopen() to get a “file-like” handle to the remote data. like basic authentication, cookies, proxies and so on.

What is import urllib2 in Python?

Urllib package is the URL handling module for python. It is used to fetch URLs (Uniform Resource Locators). It uses the urlopen function and is able to fetch URLs using a variety of different protocols. Urllib is a package that collects several modules for working with URLs, such as: urllib.


2 Answers

If you don't mind working at a slightly lower level, httplib2 (https://github.com/httplib2/httplib2) is an excellent HTTP library that includes caching functionality.

like image 63
Corey Goldberg Avatar answered Oct 12 '22 02:10

Corey Goldberg


You could use a decorator function such as:

class cache(object):
    def __init__(self, fun):
        self.fun = fun
        self.cache = {}

    def __call__(self, *args, **kwargs):
        key  = str(args) + str(kwargs)
        try:
            return self.cache[key]
        except KeyError:
            self.cache[key] = rval = self.fun(*args, **kwargs)
            return rval
        except TypeError: # incase key isn't a valid key - don't cache
            return self.fun(*args, **kwargs)

and define a function along the lines of:

@cache
def get_url_src(url):
    return urllib.urlopen(url).read()

This is assuming you're not paying attention to HTTP Cache Controls, but just want to cache the page for the duration of the application

like image 26
Will Boyce Avatar answered Oct 12 '22 01:10

Will Boyce