I am trying to replace parts of file extensions in a list of files. I would like to be able to loop through items (files), and remove the extensions. I don't know how to appropriately loop through items in the list when re.sub as the third parameter requires a string. eg. re.sub(pattern, repl, string, count=0, flags=0)
import re
file_lst = ['cats1.fa', 'cats2.fa', 'dog1.fa', 'dog2.fa']
file_lst_trimmed =[]
for file in file_lst:
file_lst_trimmed = re.sub(r'1.fa', '', file)
The issue arising here is that re.sub expects a string and I want it to loop through a list of strings.
Thanks for any advice!
Replace a specific string in a list. If you want to replace the string of elements of a list, use the string method replace() for each element with the list comprehension. If there is no string to be replaced, applying replace() will not change it, so you don't need to select an element with if condition .
sub() function belongs to the Regular Expressions ( re ) module in Python. It returns a string where all matching occurrences of the specified pattern are replaced by the replace string.
By default, the count is set to zero, which means the re. sub() method will replace all pattern occurrences in the target string.
Regex can be used to perform various tasks in Python. It is used to do a search and replace operations, replace patterns in text, check if a string contains the specific pattern.
You can use a list comprehension to construct the new list with the cleaned up files names. \d
is the regex to match a single character and $
only matches at the end of the string.
file_lst_trimmed = [re.sub(r'\d\.fa$', '', file) for file in file_lst]
The results:
>>> file_lst_trimmed
['cats', 'cats', 'dog', 'dog']
You can try this:
import re
file_lst = ['cats1.fa', 'cats2.fa', 'dog1.fa', 'dog2.fa']
final_list = [re.sub('\d+\.\w+$', '', i) for i in file_lst]
Output:
['cats', 'cats', 'dog', 'dog']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With