RegEx: words with two letters repeated twice (eg. ABpoiuyAB, xnvXYlsdjsdXYmsd)

Tags:

regex

I had two regex tasks to do today -- and I did one properly and failed with the other. the first task was to find -- in a long, long text -- all the words beginning with "F" and ending with a vowel:

(\bf)\w*([euioay]\b)

and it worked perfectly.

the second one is way too difficult for a philology student ;-) I have to find all the words with repeated at least twice two-letter sequences, for example:

tatarak is TATArak, "TA" twice;
brzozowski is brZOZOwski, "ZO" twice;
loremipsrecdks is loREmipsREcdks, "RE" twice;

can I have some help please? thanks in advance ;-)

834

asked Mar 24 '13 15:03

user2204488

2 Answers

Let's see:

(\w{2}) matches two letters (or digits/underscore, but let's ignore that) and captures them in group number 1. Then \1 matches whatever was matched by that group. So

\b\w*(\w{2})\w*\1

is what you're looking for (you don't need {2,} because if three letters are repeated, two letters are also repeated. Not checking for more than two makes the regex much more efficient. You can stop matching after the \1 backreference has succeeded).

194

answered Sep 22 '22 10:09

Tim Pietzcker

This pattern ought to do the trick

\b\w*?(\w{2})\w*?\1\w*?\b

\b is a word boundry
\w*? some number of letters (lazily)
(w{2}) exactly two letters, match and capture
\w*? same as above
\1 the content of our two letter capture group
\w*? same as above
\b another word boundry

A quick test in java:

public static void main(String[] args) {
   final Pattern pattern = Pattern.compile("\\b\\w*?(\\w{2})\\w*?\\1\\w*?\\b");
   final String string = "tatarak brzozowski loremipsrecdks a word that does not match";
   final Matcher matcher = pattern.matcher(string);
   while(matcher.find()) {
       System.out.println("Found group " + matcher.group(1) + " in word " + matcher.group());
   }
}

Output

Found group ta in word tatarak
Found group zo in word brzozowski
Found group re in word loremipsrecdks

answered Sep 24 '22 10:09

Boris the Spider

Related questions
                            
                                Regex to match any number (Real, rational along with signs)
                            
                                How can I use regular expression for unicode string in python?
                            
                                Eclipse is saying there's an Illegal repetition in this regex
                            
                                How can I make this regular expression not result in "catastrophic backtracking"?
                            
                                vba regex: dot matching newline
                            
                                Why is my PHP regex that parses Markdown links broken?
                            
                                Emacs: replace regexp with per-match prompt
                            
                                Groovy replaceAll where replacement contains dollar symbol?
                            
                                Using RegEx and Replace to update address fields with USPS abbreviations in MS-Access
                            
                                Java regexp groups replacements
                            
                                Infinite loop using a pair of Perl regex matches
                            
                                ruby regex - how replace nth instance of a match in a string
                            
                                GNU grep regex `[一-十]` (one to ten) does not match the Chinese character 四 (four)
                            
                                Regex pattern for numbers with dots
                            
                                Need to create regular expression in Javascript to check the valid conditional string
                            
                                RegEx - Match using symbols but don't replace them
                            
                                explode the price sign & the number
                            
                                Python re.sub() weirdness
                            
                                Uri.UnescapeDataString fails on different computer
                            
                                Remove non printable utf8 characters except controlchars from String

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With