Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using regular expressions to find a word with the five letters abcde, each letter appearing exactly once, in any order, with no breaks in between

Tags:

regex

perl

For example, the word debacle would work because of debac, but seabed would not work because: 1. there is no c in any 5-character sequence that can be formed, and 2. the letter e appears twice. As another example, feedback would work because of edbac. And remember, the solution must be done using only regular expressions.

A strategy I attempted to implement was: match the first letter if it's inside [a-e], and remember it. Then find the next letter in [a-e] but not the first letter. And so on. I wasn't sure what the syntax was (or even if some syntax existed) so my code didn't work:

open(DICT, "dictionary.txt");
@words = <DICT>;

foreach my $word(@words){

if ($word =~ /([a-e])([a-e^\1])([a-e^\1^\2])([a-e^\1^\2^\3])([a-e^\1^\2^\3^\4])/
){
    print $word;
}
}

I was also thinking of using (?=regex) and \G but I wasn't sure how it would work out.

like image 371
kyothine Avatar asked Jun 21 '12 17:06

kyothine


1 Answers

/
   (?= .{0,4}a )
   (?= .{0,4}b )
   (?= .{0,4}c )
   (?= .{0,4}d )
   (?= .{0,4}e )
/xs

It's probably results in faster matching to generate a pattern from all combinations.

use Algorithm::Loops qw( NextPermute );
my @pats;
my @chars = 'a'..'e';
do { push @pats, quotemeta join '', @chars; } while NextPermute(@chars);
my $re = join '|', @pats;

abcde|abced|abdce|abdec|abecd|abedc|acbde|acbed|acdbe|acdeb|acebd|acedb|adbce|adbec|adcbe|adceb|adebc|adecb|aebcd|aebdc|aecbd|aecdb|aedbc|aedcb|bacde|baced|badce|badec|baecd|baedc|bcade|bcaed|bcdae|bcdea|bcead|bceda|bdace|bdaec|bdcae|bdcea|bdeac|bdeca|beacd|beadc|becad|becda|bedac|bedca|cabde|cabed|cadbe|cadeb|caebd|caedb|cbade|cbaed|cbdae|cbdea|cbead|cbeda|cdabe|cdaeb|cdbae|cdbea|cdeab|cdeba|ceabd|ceadb|cebad|cebda|cedab|cedba|dabce|dabec|dacbe|daceb|daebc|daecb|dbace|dbaec|dbcae|dbcea|dbeac|dbeca|dcabe|dcaeb|dcbae|dcbea|dceab|dceba|deabc|deacb|debac|debca|decab|decba|eabcd|eabdc|eacbd|eacdb|eadbc|eadcb|ebacd|ebadc|ebcad|ebcda|ebdac|ebdca|ecabd|ecadb|ecbad|ecbda|ecdab|ecdba|edabc|edacb|edbac|edbca|edcab|edcba

(This will get optimised into a trie in Perl 5.10+. Before 5.10, use Regexp::List.)

like image 176
ikegami Avatar answered Oct 18 '22 11:10

ikegami