I have the following url:
url = http://photographs.500px.com/kyle/09-09-201315-47-571378756077.jpg
I would like to extract the file name in this url: 09-09-201315-47-571378756077.jpg
Once I get this file name, I'm going to save it with this name to the Desktop.
filename = **extracted file name from the url** download_photo = urllib.urlretrieve(url, "/home/ubuntu/Desktop/%s.jpg" % (filename))
After this, I'm going to resize the photo, once that is done, I've going to save the resized version and append the word "_small" to the end of the filename.
downloadedphoto = Image.open("/home/ubuntu/Desktop/%s.jpg" % (filename)) resize_downloadedphoto = downloadedphoto.resize.((300, 300), Image.ANTIALIAS) resize_downloadedphoto.save("/home/ubuntu/Desktop/%s.jpg" % (filename + _small))
From this, what I am trying to achieve is to get two files, the original photo with the original name, then the resized photo with the modified name. Like so:
09-09-201315-47-571378756077.jpg
09-09-201315-47-571378756077_small.jpg
How can I go about doing this?
The filename is the last part of the URL from the last trailing slash. For example, if the URL is http://www.example.com/dir/file.html then file. html is the file name.
To extract filename from the file, we use “GetFileName()” method of “Path” class. This method is used to get the file name and extension of the specified path string. The returned value is null if the file path is null. Syntax: public static string GetFileName (string path);
You can use urllib.parse.urlparse
with os.path.basename
:
import os from urllib.parse import urlparse url = "http://photographs.500px.com/kyle/09-09-201315-47-571378756077.jpg" a = urlparse(url) print(a.path) # Output: /kyle/09-09-201315-47-571378756077.jpg print(os.path.basename(a.path)) # Output: 09-09-201315-47-571378756077.jpg
os.path.basename(url)
Why try harder?
In [1]: os.path.basename("https://example.com/file.html") Out[1]: 'file.html' In [2]: os.path.basename("https://example.com/file") Out[2]: 'file' In [3]: os.path.basename("https://example.com/") Out[3]: '' In [4]: os.path.basename("https://example.com") Out[4]: 'example.com'
Note 2020-12-20
Nobody has thus far provided a complete solution.
A URL can contain a ?[query-string]
and/or a #[fragment Identifier]
(but only in that order: ref)
In [1]: from os import path In [2]: def get_filename(url): ...: fragment_removed = url.split("#")[0] # keep to left of first # ...: query_string_removed = fragment_removed.split("?")[0] ...: scheme_removed = query_string_removed.split("://")[-1].split(":")[-1] ...: if scheme_removed.find("/") == -1: ...: return "" ...: return path.basename(scheme_removed) ...: In [3]: get_filename("a.com/b") Out[3]: 'b' In [4]: get_filename("a.com/") Out[4]: '' In [5]: get_filename("https://a.com/") Out[5]: '' In [6]: get_filename("https://a.com/b") Out[6]: 'b' In [7]: get_filename("https://a.com/b?c=d#e") Out[7]: 'b'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With