If I open a file using urllib2, like so:
remotefile = urllib2.urlopen('http://example.com/somefile.zip')
Is there an easy way to get the file name other then parsing the original URL?
EDIT: changed openfile to urlopen... not sure how that happened.
EDIT2: I ended up using:
filename = url.split('/')[-1].split('#')[0].split('?')[0]
Unless I'm mistaken, this should strip out all potential queries as well.
urllib2 is a Python module that can be used for fetching URLs. It defines functions and classes to help with URL actions (basic and digest. authentication, redirections, cookies, etc) The magic starts with importing the urllib2 module.
The Python 3 standard library has a new urllib which is a merged/refactored/rewritten version of the older modules. urllib3 is a third-party package (i.e., not in CPython's standard library).
The problem here is that urlopen returns a reference to a file object from which you should retrieve HTML. Please note that urllib. urlopen function is marked as deprecated since python 2.6. It's recommended to use urllib2.
- Python Module of the Week A library for opening URLs that can be extended by defining custom protocol handlers. The urllib2 module provides an updated API for using internet resources identified by URLs.
urllib2 is used in python 2.x, so if you use urllib2 in python 3.x, you will get this error: No module named ‘urllib2’. To fix this error, we should use python 2.x or replace urllib.request to replace it. urllib library in python 3.x contains: urllib.request for opening and reading URLs. urllib.error containing the exceptions raised by ...
urllib.request is a Python module for fetching URLs (Uniform Resource Locators). It offers a very simple interface, in the form of the urlopen function. This is capable of fetching URLs using a variety of different protocols.
The encoding is done using a function from the urllib.parse library. Note that other encodings are sometimes required (e.g. for file upload from HTML forms - see HTML Specification, Form Submission for more details). If you do not pass the data argument, urllib uses a GET request.
Did you mean urllib2.urlopen?
You could potentially lift the intended filename if the server was sending a Content-Disposition header by checking remotefile.info()['Content-Disposition']
, but as it is I think you'll just have to parse the url.
You could use urlparse.urlsplit
, but if you have any URLs like at the second example, you'll end up having to pull the file name out yourself anyway:
>>> urlparse.urlsplit('http://example.com/somefile.zip') ('http', 'example.com', '/somefile.zip', '', '') >>> urlparse.urlsplit('http://example.com/somedir/somefile.zip') ('http', 'example.com', '/somedir/somefile.zip', '', '')
Might as well just do this:
>>> 'http://example.com/somefile.zip'.split('/')[-1] 'somefile.zip' >>> 'http://example.com/somedir/somefile.zip'.split('/')[-1] 'somefile.zip'
If you only want the file name itself, assuming that there's no query variables at the end like http://example.com/somedir/somefile.zip?foo=bar then you can use os.path.basename for this:
[user@host]$ python Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.path.basename("http://example.com/somefile.zip") 'somefile.zip' >>> os.path.basename("http://example.com/somedir/somefile.zip") 'somefile.zip' >>> os.path.basename("http://example.com/somedir/somefile.zip?foo=bar") 'somefile.zip?foo=bar'
Some other posters mentioned using urlparse, which will work, but you'd still need to strip the leading directory from the file name. If you use os.path.basename() then you don't have to worry about that, since it returns only the final part of the URL or file path.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With