I am a python beginner. I am using urllib2 to download files. When I download a file, I specify a filename to with which to save the downloaded file on my hard drive. However, if I download the file using my browser, a default filename is automatically provided.
Here is a simplified version of my code:
def downloadmp3(url):
webFile = urllib2.urlopen(url)
filename = 'temp.zip'
localFile = open(filename, 'w')
localFile.write(webFile.read())
The file downloads just fine, but if I type the string stored in the variable "url" into my browser, there is a default filename given to the file when I download it. I want to use this filename for my downloaded file not 'temp.zip' or whatever I assign it.
How do I use urllib2 (or some other Python library) to save the file with the filename that the server I am downloading from intends it to have?
If anyone doesn't understand this question, please say so, so that I can try to make it clearer.
The filename is usually included by the server through the content-disposition header:
content-disposition: attachment; filename=foo.pdf
You have access to the headers through
result = urllib2.urlopen(...)
result.info() <- contains the headers
i>>> import urllib2
ur>>> result = urllib2.urlopen('http://zopyx.com')
>>> print result
<addinfourl at 4302289808 whose fp = <socket._fileobject object at 0x1006dd5d0>>
>>> result.info()
<httplib.HTTPMessage instance at 0x1006fbab8>
>>> result.info().headers
['Date: Mon, 04 Apr 2011 02:08:28 GMT\r\n', 'Server: Zope/(unreleased version, python 2.4.6, linux2) ZServer/1.1 Plone/3.3.4\r\n', 'Content-Length: 15321\r\n', 'Content-Type: text/html; charset=utf-8\r\n', 'Via: 1.1 www.zopyx.com\r\n', 'Cache-Control: max-age=3600\r\n', 'Expires: Mon, 04 Apr 2011 03:08:28 GMT\r\n', 'Connection: close\r\n']
See
http://docs.python.org/library/urllib2.html
But be aware that this header does not need to be present. Otherwise you need to generate a reasonable name yourself from the URL requested - e.g. from the last component of the URI. Use the urlparse() method of Python in this case.
My issue with the previous answers is that they were using the original URL, and that would fail in the case of a redirect. Here's how I do it: (note the use of result.url
instead of url
)
import os
import urllib2
result = urllib2.urlopen(url)
filename = os.path.basename(urllib2.urlparse.urlparse(result.url).path)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With