Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find words with repeating characters

Tags:

regex

perl

Want search every word in a dictionary what has the same character exactly at the second and last positon, and one times somewhere middle.

examples:

statement - has the "t" at the second, fourth and last place
severe = has "e" at 2,4,last
abbxb = "b" at 2,3,last

wrong

abab = "b" only 2 times not 3
abxxxbyyybzzzzb - "b" 4 times, not 3

my grep is not working

my @ok = grep { /^(.)(.)[^\2]+(\2)[^\2]+(\2)$/ } @wordlist;

e.g. the

perl -nle 'print if /^(.)(.)[^\2]+(\2)[^\2]+(\2)$/' < /usr/share/dict/words

prints for example the

zarabanda

what is wrong.

What should be the correct regex?

EDIT:

And how to i can capture the enclosed groups? e.g. for the

statement - want cantupre: st(a)t(emen)t - for the later use

my $w1 = $1; my w2 = $2; or something like...
like image 831
jm666 Avatar asked Jun 02 '13 00:06

jm666


4 Answers

(?:(?!STRING).)* is STRING as [^CHAR]* is to CHAR, so what you want is:

^.             # Ignore first char
(.)            # Capture second char
(?:(?!\1).)*   # Any number of chars that aren't the second char
\1             # Second char
(?:(?!\1).)*   # Any number of chars that aren't the second char
\1\z           # Second char at the end of the string.

So you get:

perl -ne'print if /^. (.) (?:(?!\1).)* \1 (?:(?!\1).)* \1$/x' \
   /usr/share/dict/words

To capture what's in between, add parens around both (?:(?!\1).)*.

perl -nle'print "$2:$3" if /^. (.) ((?:(?!\1).)*) \1 ((?:(?!\1).)*) \1\z/x' \
   /usr/share/dict/words
like image 73
ikegami Avatar answered Nov 15 '22 13:11

ikegami


This is the regex that should work for you:

^.(.)(?=(?:.*?\1){2})(?!(?:.*?\1){3}).*?\1$

Live Demo: http://www.rubular.com/r/bEMgutE7t5

like image 44
anubhava Avatar answered Nov 15 '22 15:11

anubhava


Using lookahead:

/^.(.)(?!(?:.*\1){3}).*\1(.*)\1$/

Meaning:

/^.(.)(?!(?:.*\1){3})  # capture the second character if it is not
                       # repeated more than twice after the 2nd position
.*\1(.*)\1$              # match captured char 2 times the last one at the end
like image 1
perreal Avatar answered Nov 15 '22 13:11

perreal


my @ok = grep {/^.(\w)/; /^.$1[^$1]*?$1[^$1]*$1$/ } @wordlist;
like image 1
David Avatar answered Nov 15 '22 13:11

David