Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove Part of String Before the Last Forward Slash

The program I am currently working on retrieves URLs from a website and puts them into a list. What I want to get is the last section of the URL.

So, if the first element in my list of URLs is "https://docs.python.org/3.4/tutorial/interpreter.html" I would want to remove everything before "interpreter.html".

Is there a function, library, or regex I could use to make this happen? I've looked at other Stack Overflow posts but the solutions don't seem to work.

These are two of my several attempts:

for link in link_list:
   file_names.append(link.replace('/[^/]*$',''))
print(file_names)

&

for link in link_list:
   file_names.append(link.rpartition('//')[-1])
print(file_names)
like image 879
freddiev4 Avatar asked Apr 15 '15 17:04

freddiev4


People also ask

How do you remove the last slash from a string?

Use the String. replace() method to remove a trailing slash from a string, e.g. str. replace(/\/+$/, '') . The replace method will remove the trailing slash from the string by replacing it with an empty string.

How do you remove the last part of a string in Python?

Use the . lstrip() method to remove whitespace and characters only from the beginning of a string. Use the . rstrip() method to remove whitespace and characters only from the end of a string.

How do you get rid of the forward slash in Python?

replace() method to remove the forward slashes from a string, e.g. new_string = string. replace('/', '') . The str. replace() method will remove the forward slashes from the string by replacing them with empty strings.

How do you get the part of a string before a specific character in Python?

Use str. partition() to get the part of a string before the first occurrence of a specific character.


2 Answers

Have a look at str.rsplit.

>>> s = 'https://docs.python.org/3.4/tutorial/interpreter.html'
>>> s.rsplit('/',1)
['https://docs.python.org/3.4/tutorial', 'interpreter.html']
>>> s.rsplit('/',1)[1]
'interpreter.html'

And to use RegEx

>>> re.search(r'(.*)/(.*)',s).group(2)
'interpreter.html'

Then match the 2nd group which lies between the last / and the end of String. This is a greedy usage of the greedy technique in RegEx.

Regular expression visualization

Debuggex Demo

Small Note - The problem with link.rpartition('//')[-1] in your code is that you are trying to match // and not /. So remove the extra / as in link.rpartition('/')[-1].

like image 80
Bhargav Rao Avatar answered Oct 21 '22 08:10

Bhargav Rao


That doesn't need regex.

import os

for link in link_list:
    file_names.append(os.path.basename(link))
like image 37
TigerhawkT3 Avatar answered Oct 21 '22 06:10

TigerhawkT3