I want to remove characters that repeat more than twice in a word. For example
"hhaaappppyyyyyyy mmoooooorning friendsssssssssssssss, good goood day"
to
"hhaappyy mmoorning friendss, good good day"
I have tried something like this, but it is not reducing to exactly 2 repetitions.
gsub('([[:alpha:]])\\1{2}', '\\1',
'hhaaappppyyyyyyy mmoooooorning friendsssssssssssssss, good goood day')
#[1] "hhappyyy mmoorning friendsssss, good god day"
Thank you.
You need to use {2,}
quantifier and use two \1
in the replacement:
s<-'hhaaappppyyyyyyy mmoooooorning friendsssssssssssssss, good goood day'
gsub('([[:alpha:]])\\1{2,}', '\\1\\1', s)
# => [1] "hhaappyy mmoorning friendss, good good day"
See the R demo.
The ([[:alpha:]])\\1{2,}
pattern matches and captures a letter into Group 1 and then 2 or more repetitions of the same char are matched. Two \1
in the replacement pattern replace the whole match with 2 occurrences of the char. It is valid to use two \1
placeholders because every match is at least 3 identical chars.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With