The program I am currently working on retrieves URLs from a website and puts them into a list. What I want to get is the last section of the URL.
So, if the first element in my list of URLs is "https://docs.python.org/3.4/tutorial/interpreter.html"
I would want to remove everything before "interpreter.html"
.
Is there a function, library, or regex I could use to make this happen? I've looked at other Stack Overflow posts but the solutions don't seem to work.
These are two of my several attempts:
for link in link_list:
file_names.append(link.replace('/[^/]*$',''))
print(file_names)
&
for link in link_list:
file_names.append(link.rpartition('//')[-1])
print(file_names)
Use the String. replace() method to remove a trailing slash from a string, e.g. str. replace(/\/+$/, '') . The replace method will remove the trailing slash from the string by replacing it with an empty string.
Use the . lstrip() method to remove whitespace and characters only from the beginning of a string. Use the . rstrip() method to remove whitespace and characters only from the end of a string.
replace() method to remove the forward slashes from a string, e.g. new_string = string. replace('/', '') . The str. replace() method will remove the forward slashes from the string by replacing them with empty strings.
Use str. partition() to get the part of a string before the first occurrence of a specific character.
Have a look at str.rsplit
.
>>> s = 'https://docs.python.org/3.4/tutorial/interpreter.html'
>>> s.rsplit('/',1)
['https://docs.python.org/3.4/tutorial', 'interpreter.html']
>>> s.rsplit('/',1)[1]
'interpreter.html'
And to use RegEx
>>> re.search(r'(.*)/(.*)',s).group(2)
'interpreter.html'
Then match the 2nd group which lies between the last /
and the end of String. This is a greedy usage of the greedy technique in RegEx.
Debuggex Demo
Small Note - The problem with link.rpartition('//')[-1]
in your code is that you are trying to match //
and not /
. So remove the extra /
as in link.rpartition('/')[-1]
.
That doesn't need regex.
import os
for link in link_list:
file_names.append(os.path.basename(link))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With