If I open a file using urllib2, like so: <pre class="prettyprint"><code>remotefile = urllib2.urlopen('http://example.com/somefile.zip') </code></pre> Is there an easy way to get the file name other then parsing the original URL? EDIT: changed openfile to urlopen... not sure how that happened. EDIT2: I ended up using: <pre class="prettyprint"><code>filename = url.split('/')[-1].split('#')[0].split('?')[0] </code></pre> Unless I'm mistaken, this should strip out all potential queries as well.

Did you mean urllib2.urlopen? You could potentially lift the intended filename if the server was sending a Content-Disposition header by checking <code>remotefile.info()['Content-Disposition']</code>, but as it is I think you'll just have to parse the url. You could use <code>urlparse.urlsplit</code>, but if you have any URLs like at the second example, you'll end up having to pull the file name out yourself anyway: <pre class="prettyprint"><code>>>> urlparse.urlsplit('http://example.com/somefile.zip') ('http', 'example.com', '/somefile.zip', '', '') >>> urlparse.urlsplit('http://example.com/somedir/somefile.zip') ('http', 'example.com', '/somedir/somefile.zip', '', '') </code></pre> Might as well just do this: <pre class="prettyprint"><code>>>> 'http://example.com/somefile.zip'.split('/')[-1] 'somefile.zip' >>> 'http://example.com/somedir/somefile.zip'.split('/')[-1] 'somefile.zip' </code></pre>

If you only want the file name itself, assuming that there's no query variables at the end like http://example.com/somedir/somefile.zip?foo=bar then you can use os.path.basename for this: <pre class="prettyprint"><code>[user@host]$ python Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.path.basename("http://example.com/somefile.zip") 'somefile.zip' >>> os.path.basename("http://example.com/somedir/somefile.zip") 'somefile.zip' >>> os.path.basename("http://example.com/somedir/somefile.zip?foo=bar") 'somefile.zip?foo=bar' </code></pre> Some other posters mentioned using urlparse, which will work, but you'd still need to strip the leading directory from the file name. If you use os.path.basename() then you don't have to worry about that, since it returns only the final part of the URL or file path.

urllib2 file name

Tags:

python

url

urllib2

If I open a file using urllib2, like so:

remotefile = urllib2.urlopen('http://example.com/somefile.zip')

Is there an easy way to get the file name other then parsing the original URL?

EDIT: changed openfile to urlopen... not sure how that happened.

EDIT2: I ended up using:

filename = url.split('/')[-1].split('#')[0].split('?')[0]

Unless I'm mistaken, this should strip out all potential queries as well.

285

asked Oct 02 '08 15:10

defrex

2 Answers

Did you mean urllib2.urlopen?

You could potentially lift the intended filename if the server was sending a Content-Disposition header by checking remotefile.info()['Content-Disposition'], but as it is I think you'll just have to parse the url.

You could use urlparse.urlsplit, but if you have any URLs like at the second example, you'll end up having to pull the file name out yourself anyway:

>>> urlparse.urlsplit('http://example.com/somefile.zip') ('http', 'example.com', '/somefile.zip', '', '') >>> urlparse.urlsplit('http://example.com/somedir/somefile.zip') ('http', 'example.com', '/somedir/somefile.zip', '', '')

Might as well just do this:

>>> 'http://example.com/somefile.zip'.split('/')[-1] 'somefile.zip' >>> 'http://example.com/somedir/somefile.zip'.split('/')[-1] 'somefile.zip'

103

answered Oct 14 '22 04:10

Jonny Buchanan

If you only want the file name itself, assuming that there's no query variables at the end like http://example.com/somedir/somefile.zip?foo=bar then you can use os.path.basename for this:

[user@host]$ python Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04)  Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.path.basename("http://example.com/somefile.zip") 'somefile.zip' >>> os.path.basename("http://example.com/somedir/somefile.zip") 'somefile.zip' >>> os.path.basename("http://example.com/somedir/somefile.zip?foo=bar") 'somefile.zip?foo=bar'

Some other posters mentioned using urlparse, which will work, but you'd still need to strip the leading directory from the file name. If you use os.path.basename() then you don't have to worry about that, since it returns only the final part of the URL or file path.

answered Oct 14 '22 04:10

Jay

Related questions
                            
                                How to run different python versions in cmd [duplicate]
                            
                                Django: Difference between using server through manage.py and other servers like gunicorn etc. Which is better?
                            
                                How to turn off dropout for testing in Tensorflow?
                            
                                Keras: change learning rate
                            
                                Can ElementTree be told to preserve the order of attributes?
                            
                                Unicode Encode Error when writing pandas df to csv
                            
                                Python pandas slice dataframe by multiple index ranges
                            
                                Tensorflow Slim: TypeError: Expected int32, got list containing Tensors of type '_Message' instead
                            
                                Conda set LD_LIBRARY_PATH for env only [duplicate]
                            
                                True dynamic and anonymous functions possible in Python?
                            
                                libpython2.7.so.1.0: cannot open shared object file: No such file or directory
                            
                                Upgraded to Ubuntu 16.04 now MySQL-python dependencies are broken
                            
                                Setting delete-orphan on SQLAlchemy relationship causes AssertionError: This AttributeImpl is not configured to track parents
                            
                                tf-idf feature weights using sklearn.feature_extraction.text.TfidfVectorizer
                            
                                Django: from django.urls import reverse; ImportError: No module named urls [duplicate]
                            
                                If all in list == something
                            
                                What does list[x::y] do? [duplicate]
                            
                                Single legend for multiple axes [duplicate]
                            
                                Scrapy: how to disable or change log?
                            
                                Difference between ManyToOneRel and ForeignKey?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With