I am trying to change the text string from the form of file1
to file01
. I am really new to python and can't figure out what should go in 'repl' location when trying to use a pattern. Can anyone give me a hand?
text = 'file1 file2 file3'
x = re.sub(r'file[1-9]',r'file\0\w',text) #I'm not sure what should go in repl.
re. sub() function is used to replace occurrences of a particular sub-string with another sub-string. This function takes as input the following: The sub-string to replace.
sub() Replace matching substrings with a new string for all occurrences, or a specified number.
If you want to replace a string that matches a regular expression (regex) instead of perfect match, use the sub() of the re module. In re. sub() , specify a regex pattern in the first argument, a new string in the second, and a string to be processed in the third.
The count argument will set the maximum number of replacements that we want to make inside the string. By default, the count is set to zero, which means the re. sub() method will replace all pattern occurrences in the target string.
You could try this:
>>> import re
>>> text = 'file1 file2 file3'
>>> x = re.sub(r'file([1-9])',r'file0\1',text)
'file01 file02 file03'
The brackets wrapped around the [1-9]
captures the match, and it is the first match. You will see I used it in the replace using \1
meaning the first catch in the match.
Also, if you don't want to add the zero for files with 2 digits or more, you could add [^\d]
in the regexp:
x = re.sub(r'file([1-9](\s|$))',r'file0\1',text)
A bit more of a generic solution now that I'm revisiting this answer using str.format()
and a lambda
expression:
import re
fmt = '{:03d}' # Let's say we want 3 digits with leading zeroes
s = 'file1 file2 file3 text40'
result = re.sub(r"([A-Za-z_]+)([0-9]+)", \
lambda x: x.group(1) + fmt.format(int(x.group(2))), \
s)
print(result)
# 'file001 file002 file003 text040'
A bit of details about the lambda expression:
lambda x: x.group(1) + fmt.format(int(x.group(2)))
# ^--------^ ^-^ ^-------------^
# filename format file number ([0-9]+) converted to int
# ([A-Za-z_]+) so format() can work with our format
I am using the expression [A-Za-z_]+
assuming the filename contains letters and underscores only besides the training digits. Do pick a more appropriate expression if required.
To match files with single digit on the end, use a word boundary \b
:
>>> text = ' '.join('file{}'.format(i) for i in range(12))
>>> text
'file0 file1 file2 file3 file4 file5 file6 file7 file8 file9 file10 file11'
>>> import re
>>> re.sub(r'file(\d)\b',r'file0\1',text)
'file00 file01 file02 file03 file04 file05 file06 file07 file08 file09 file10 file11'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With