Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match distinct repeated characters

Tags:

regex

r

I'm trying to come up with a regex in R to match strings in which there is repetition of two distinct characters.

x <- c("aaaaaaah" ,"aaaah","ahhhh","cooee","helloee","mmmm","noooo","ohhhh","oooaaah","ooooh","sshh","ummmmm","vroomm","whoopee","yippee")

This regex matches all of the above, including strings such as "mmmm" and "ohhhh" where the repeated letter is the same in the first and the second repetition:

grep(".*([a-z])\\1.*([a-z])\\2", x, value = T)

What I'd like to match in x are these strings where the repeated letters are distinct:

"cooee","helloee","oooaaah","sshh","vroomm","whoopee","yippee"

How can the regex be tweaked to make sure the second repeated character is not the same as the first?

like image 955
Chris Ruehlemann Avatar asked Jun 24 '20 09:06

Chris Ruehlemann


People also ask

How do I check if a string has repeated characters?

If we want to know whether a given string has repeated characters, the simplest way is to use the existing method of finding first occurrence from the end of the string, e.g. lastIndexOf in java. In Python, the equivalence would be rfind method of string type that will look for the last occurrence of the substring.

How many different substring exist in it that have no repeating characters?

Explanation: There are 4 unique substrings.

How do you find non repeated characters in a string?

Using the indexOf() and lastIndexOf() method, we can find the first non-repeating character in a string in Java. The method indexOf() returns the position of the first occurrence of a given character in a string whereas method lastIndexOf() returns the position of the last occurrence of a given character in a string.


2 Answers

You may restrict the second char pattern with a negative lookahead:

grep(".*([a-z])\\1.*(?!\\1)([a-z])\\2", x, value=TRUE, perl=TRUE)
#                    ^^^^^

See the regex demo.

(?!\\1)([a-z]) means match and capture into Group 2 any lowercase ASCII letter if it is not the same as the value in Group 1.

R demo:

x <- c("aaaaaaah" ,"aaaah","ahhhh","cooee","helloee","mmmm","noooo","ohhhh","oooaaah","ooooh","sshh","ummmmm","vroomm","whoopee","yippee")
grep(".*([a-z])\\1.*(?!\\1)([a-z])\\2", x, value=TRUE, perl=TRUE)
# => "cooee"   "helloee" "oooaaah" "sshh"    "vroomm"  "whoopee" "yippee" 
like image 52
Wiktor Stribiżew Avatar answered Oct 31 '22 10:10

Wiktor Stribiżew


If you can avoid regex altogether, then I think that's the way to go. A rough example:

nrep <- sapply(
  strsplit(x, ""), 
  function(y) {
     run_lengths <- rle(y)
     length(unique(run_lengths$values[run_lengths$lengths >= 2]))
   }
)
x[nrep > 1]
# [1] "cooee"   "helloee" "oooaaah" "sshh"    "vroomm"  "whoopee" "yippee"
like image 43
sindri_baldur Avatar answered Oct 31 '22 11:10

sindri_baldur