Using python regular expression only, how to find and replace nth occurrence of word in a sentence? For example:
str = 'cat goose mouse horse pig cat cow'
new_str = re.sub(r'cat', r'Bull', str)
new_str = re.sub(r'cat', r'Bull', str, 1)
new_str = re.sub(r'cat', r'Bull', str, 2)
I have a sentence above where the word 'cat' appears two times in the sentence. I want 2nd occurence of the 'cat' to be changed to 'Bull' leaving 1st 'cat' word untouched. My final sentence would look like: "cat goose mouse horse pig Bull cow". In my code above I tried 3 different times could not get what I wanted.
sub() method will replace all pattern occurrences in the target string. By setting the count=1 inside a re. sub() we can replace only the first occurrence of a pattern in the target string with another string. Set the count value to the number of replacements you want to perform.
Practical Data Science using Python You can find the nth occurrence of a substring in a string by splitting at the substring with max n+1 splits. If the resulting list has a size greater than n+1, it means that the substring occurs more than n times.
replace (old, new[, count]) -> string Return a copy of string S with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
Use negative lookahead like below.
>>> s = "cat goose mouse horse pig cat cow"
>>> re.sub(r'^((?:(?!cat).)*cat(?:(?!cat).)*)cat', r'\1Bull', s)
'cat goose mouse horse pig Bull cow'
DEMO
^
Asserts that we are at the start.(?:(?!cat).)*
Matches any character but not of cat
, zero or more times.cat
matches the first cat
substring.(?:(?!cat).)*
Matches any character but not of cat
, zero or more times.((?:(?!cat).)*cat(?:(?!cat).)*)
, so that we could refer those captured chars on later.cat
now the following second cat
string is matched.OR
>>> s = "cat goose mouse horse pig cat cow"
>>> re.sub(r'^(.*?(cat.*?){1})cat', r'\1Bull', s)
'cat goose mouse horse pig Bull cow'
Change the number inside the {}
to replace the first or second or nth occurrence of the string cat
To replace the third occurrence of the string cat
, put 2
inside the curly braces ..
>>> re.sub(r'^(.*?(cat.*?){2})cat', r'\1Bull', "cat goose mouse horse pig cat foo cat cow")
'cat goose mouse horse pig cat foo Bull cow'
Play with the above regex on here ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With