Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

urllib2 file name

If I open a file using urllib2, like so:

remotefile = urllib2.urlopen('http://example.com/somefile.zip') 

Is there an easy way to get the file name other then parsing the original URL?

EDIT: changed openfile to urlopen... not sure how that happened.

EDIT2: I ended up using:

filename = url.split('/')[-1].split('#')[0].split('?')[0] 

Unless I'm mistaken, this should strip out all potential queries as well.

like image 285
defrex Avatar asked Oct 02 '08 15:10

defrex


People also ask

What is urllib2?

urllib2 is a Python module that can be used for fetching URLs. It defines functions and classes to help with URL actions (basic and digest. authentication, redirections, cookies, etc) The magic starts with importing the urllib2 module.

Is Urllib and urllib3 same?

The Python 3 standard library has a new urllib which is a merged/refactored/rewritten version of the older modules. urllib3 is a third-party package (i.e., not in CPython's standard library).

What does Urllib Urlopen return?

The problem here is that urlopen returns a reference to a file object from which you should retrieve HTML. Please note that urllib. urlopen function is marked as deprecated since python 2.6. It's recommended to use urllib2.

What is urllib2?

- Python Module of the Week A library for opening URLs that can be extended by defining custom protocol handlers. The urllib2 module provides an updated API for using internet resources identified by URLs.

How to fix the urllib2 error in Python 3?

urllib2 is used in python 2.x, so if you use urllib2 in python 3.x, you will get this error: No module named ‘urllib2’. To fix this error, we should use python 2.x or replace urllib.request to replace it. urllib library in python 3.x contains: urllib.request for opening and reading URLs. urllib.error containing the exceptions raised by ...

What is urllib request in Python?

urllib.request is a Python module for fetching URLs (Uniform Resource Locators). It offers a very simple interface, in the form of the urlopen function. This is capable of fetching URLs using a variety of different protocols.

How does urllib encode data?

The encoding is done using a function from the urllib.parse library. Note that other encodings are sometimes required (e.g. for file upload from HTML forms - see HTML Specification, Form Submission for more details). If you do not pass the data argument, urllib uses a GET request.


2 Answers

Did you mean urllib2.urlopen?

You could potentially lift the intended filename if the server was sending a Content-Disposition header by checking remotefile.info()['Content-Disposition'], but as it is I think you'll just have to parse the url.

You could use urlparse.urlsplit, but if you have any URLs like at the second example, you'll end up having to pull the file name out yourself anyway:

>>> urlparse.urlsplit('http://example.com/somefile.zip') ('http', 'example.com', '/somefile.zip', '', '') >>> urlparse.urlsplit('http://example.com/somedir/somefile.zip') ('http', 'example.com', '/somedir/somefile.zip', '', '') 

Might as well just do this:

>>> 'http://example.com/somefile.zip'.split('/')[-1] 'somefile.zip' >>> 'http://example.com/somedir/somefile.zip'.split('/')[-1] 'somefile.zip' 
like image 103
Jonny Buchanan Avatar answered Oct 14 '22 04:10

Jonny Buchanan


If you only want the file name itself, assuming that there's no query variables at the end like http://example.com/somedir/somefile.zip?foo=bar then you can use os.path.basename for this:

[user@host]$ python Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04)  Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.path.basename("http://example.com/somefile.zip") 'somefile.zip' >>> os.path.basename("http://example.com/somedir/somefile.zip") 'somefile.zip' >>> os.path.basename("http://example.com/somedir/somefile.zip?foo=bar") 'somefile.zip?foo=bar' 

Some other posters mentioned using urlparse, which will work, but you'd still need to strip the leading directory from the file name. If you use os.path.basename() then you don't have to worry about that, since it returns only the final part of the URL or file path.

like image 27
Jay Avatar answered Oct 14 '22 04:10

Jay