Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I remove repeated characters in a string with R?

Tags:

I would like to implement a function with R that removes repeated characters in a string. For instance, say my function is named removeRS, so it is supposed to work this way:

  removeRS('Buenaaaaaaaaa Suerrrrte')   Buena Suerte   removeRS('Hoy estoy tristeeeeeee')   Hoy estoy triste 

My function is going to be used with strings written in spanish, so it is not that common (or at least correct) to find words that have more than three successive vowels. No bother about the possible sentiment behind them. Nonetheless, there are words that can have two successive consonants (especially ll and rr), but we could skip this from our function.

So, to sum up, this function should replace the letters that appear at least three times in a row with just that letter. In one of the examples above, aaaaaaaaa is replaced with a.

Could you give me any hints to carry out this task with R?

like image 517
nhern121 Avatar asked Jun 22 '12 21:06

nhern121


People also ask

How do I remove duplicates from a string in R?

To remove duplicates in R, Use duplicated() method: It identifies the duplicate elements. Using unique() method: It extracts unique elements. dplyr package's distinct() function: Removing duplicate rows from a data frame.

How do I remove a character from a string in R?

How to remove a character or multiple characters from a string in R? You can either use R base function gsub() or use str_replace() from stringr package to remove characters from a string or text.


1 Answers

I did not think very carefully on this, but this is my quick solution using references in regular expressions:

gsub('([[:alpha:]])\\1+', '\\1', 'Buenaaaaaaaaa Suerrrrte') # [1] "Buena Suerte" 

() captures a letter first, \\1 refers to that letter, + means to match it once or more; put all these pieces together, we can match a letter two or more times.

To include other characters besides alphanumerics, replace [[:alpha:]] with a regex matching whatever you wish to include.

like image 172
Yihui Xie Avatar answered Sep 23 '22 14:09

Yihui Xie