How to match distinct repeated characters

Tags:

regex

r

I'm trying to come up with a regex in R to match strings in which there is repetition of two distinct characters.

x <- c("aaaaaaah" ,"aaaah","ahhhh","cooee","helloee","mmmm","noooo","ohhhh","oooaaah","ooooh","sshh","ummmmm","vroomm","whoopee","yippee")

This regex matches all of the above, including strings such as "mmmm" and "ohhhh" where the repeated letter is the same in the first and the second repetition:

grep(".*([a-z])\\1.*([a-z])\\2", x, value = T)

What I'd like to match in x are these strings where the repeated letters are distinct:

"cooee","helloee","oooaaah","sshh","vroomm","whoopee","yippee"

How can the regex be tweaked to make sure the second repeated character is not the same as the first?

955

asked Jun 24 '20 09:06

Chris Ruehlemann

2 Answers

You may restrict the second char pattern with a negative lookahead:

grep(".*([a-z])\\1.*(?!\\1)([a-z])\\2", x, value=TRUE, perl=TRUE)
#                    ^^^^^

See the regex demo.

(?!\\1)([a-z]) means match and capture into Group 2 any lowercase ASCII letter if it is not the same as the value in Group 1.

R demo:

x <- c("aaaaaaah" ,"aaaah","ahhhh","cooee","helloee","mmmm","noooo","ohhhh","oooaaah","ooooh","sshh","ummmmm","vroomm","whoopee","yippee")
grep(".*([a-z])\\1.*(?!\\1)([a-z])\\2", x, value=TRUE, perl=TRUE)
# => "cooee"   "helloee" "oooaaah" "sshh"    "vroomm"  "whoopee" "yippee"

answered Oct 31 '22 10:10

Wiktor Stribiżew

If you can avoid regex altogether, then I think that's the way to go. A rough example:

nrep <- sapply(
  strsplit(x, ""), 
  function(y) {
     run_lengths <- rle(y)
     length(unique(run_lengths$values[run_lengths$lengths >= 2]))
   }
)
x[nrep > 1]
# [1] "cooee"   "helloee" "oooaaah" "sshh"    "vroomm"  "whoopee" "yippee"

answered Oct 31 '22 11:10

sindri_baldur

Related questions
                            
                                How to rename all column names in tibble by passing a character vector?
                            
                                Overlay histogram and histogram border in ggplot
                            
                                How to save a table as an image but also preserve its quality? R
                            
                                How to summarise a categorical variable with missing data?
                            
                                Count rows in data table with certain values by group
                            
                                R packages cem and MatchIt: Different imbalance measure
                            
                                Reproducing R's gaussian process maximum likelihood regression in Python
                            
                                Install R packages using conda via an environment.yml file
                            
                                Filter data frame columns based on list values
                            
                                Regex force length of specific regex [closed]
                            
                                regression models in r output table to word
                            
                                Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match R
                            
                                Create "The Economist" Style Plots in R?
                            
                                How can create a function using variables in a dataframe
                            
                                Getting rows in data frame based on mutiple ranges in R
                            
                                How to label only the modal peak in a geom_col plot
                            
                                How to add a point on the y-intercept (y-axis) using ggplot2
                            
                                How to correctly set up rpy2?
                            
                                Is there an R function "parallel sum"? [duplicate]
                            
                                What leads the first element of a printed list to be enclosed with backticks in R v3.5.1?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With