Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does urllib2.urlopen() cache stuff?

They didn't mention this in python documentation. And recently I'm testing a website simply refreshing the site using urllib2.urlopen() to extract certain content, I notice sometimes when I update the site urllib2.urlopen() seems not get the newly added content. So I wonder it does cache stuff somewhere, right?

like image 954
Shane Avatar asked Aug 27 '10 16:08

Shane


People also ask

What does Urlopen does in Python?

Urllib package is the URL handling module for python. It is used to fetch URLs (Uniform Resource Locators). It uses the urlopen function and is able to fetch URLs using a variety of different protocols.

What does Urllib Urlopen return?

The data returned by urlopen() or urlretrieve() is the raw data returned by the server. This may be binary data (such as an image), plain text or (for example) HTML. The HTTP protocol provides type information in the reply header, which can be inspected by looking at the Content-Type header.

What does Urllib request Urlopen do?

request is a Python module for fetching URLs (Uniform Resource Locators). It offers a very simple interface, in the form of the urlopen function. This is capable of fetching URLs using a variety of different protocols.

What does Urlopen mean?

So in layman terms urlopen() opens a connection to the url and response.


1 Answers

Your web server or an HTTP proxy may be caching content. You can try to disable caching by adding a Pragma: no-cache request header:

request = urllib2.Request(url)
request.add_header('Pragma', 'no-cache')
content = urllib2.build_opener().open(request)
like image 77
Luca Avatar answered Oct 03 '22 22:10

Luca