Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to limit character repetition in a word to 2?

Tags:

string

regex

r

I want to remove characters that repeat more than twice in a word. For example

 "hhaaappppyyyyyyy mmoooooorning friendsssssssssssssss, good goood day"

to

 "hhaappyy mmoorning friendss, good good day"

I have tried something like this, but it is not reducing to exactly 2 repetitions.

gsub('([[:alpha:]])\\1{2}', '\\1', 
   'hhaaappppyyyyyyy mmoooooorning friendsssssssssssssss, good goood day')

#[1] "hhappyyy mmoorning friendsssss, good god day"

Thank you.

like image 496
M L Avatar asked Jan 29 '23 13:01

M L


1 Answers

You need to use {2,} quantifier and use two \1 in the replacement:

s<-'hhaaappppyyyyyyy mmoooooorning friendsssssssssssssss, good goood day'
gsub('([[:alpha:]])\\1{2,}', '\\1\\1', s)
# => [1] "hhaappyy mmoorning friendss, good good day"

See the R demo.

The ([[:alpha:]])\\1{2,} pattern matches and captures a letter into Group 1 and then 2 or more repetitions of the same char are matched. Two \1 in the replacement pattern replace the whole match with 2 occurrences of the char. It is valid to use two \1 placeholders because every match is at least 3 identical chars.

like image 137
Wiktor Stribiżew Avatar answered Jan 31 '23 09:01

Wiktor Stribiżew