Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to generate a (compact) regular expression for an anagram of an arbitrary string?

Problem: write a program in any language which, given a string of characters, generates a regex that matches any anagram of the input string. For all regexes greater than some length N, The regex must be shorter than the "brute force" solution listing all possible anagrams separated by "|", and the length of the regex should grow "slowly" as the input string grows (ideally linearly, but possibly n ln n).

Can you do it? I've tried, but my attempts are so far from succeeding, that I'm beginning to doubt it's possible. The only reason I ask is I thought I had seen a solution on another site, but much pointless googling failed to uncover it a second time.

like image 635
Mike Sokolov Avatar asked Sep 17 '11 22:09

Mike Sokolov


1 Answers

I think this javascript code will work according to your specifications. The regex length will increase linearly with the length of the input. It generates a regex which uses positive lookahead to match the anagram of the input string. The lookahead part of regex makes sure all the characters are present in the test input string ignoring their order and the matching part ensures that the length of the test input string is same as the length of the input string (for which regex is constructed).

function anagramRegexGenerator(input) {
    var lookaheadPart = '';
    var matchingPart = '^';
    var positiveLookaheadPrefix='(?=';
    var positiveLookaheadSuffix=')';
    var inputCharacterFrequencyMap = {}
    for ( var i = 0; i< input.length; i++ )
    {
        if (!inputCharacterFrequencyMap[input[i]]) {
            inputCharacterFrequencyMap[input[i]] = 1
        } else {
            ++inputCharacterFrequencyMap[input[i]];
        }
    }
    for ( var j in inputCharacterFrequencyMap) {
        lookaheadPart += positiveLookaheadPrefix;
        for (var k = 0; k< inputCharacterFrequencyMap[j]; k++) {
            lookaheadPart += '.*';
            if (j == ' ') {
                lookaheadPart += '\\s';
            } else {
                lookaheadPart += j;
            }
            matchingPart += '.';
        }
        lookaheadPart += positiveLookaheadSuffix;
    }
    matchingPart += '$';
    return lookaheadPart + matchingPart;
}

Sample input and output is the following

anagramRegexGenerator('aaadaaccc')
//generates the following string.
"(?=.*a.*a.*a.*a.*a)(?=.*d)(?=.*c.*c.*c)^.........$"
anagramRegexGenerator('abcdef ghij'); 
//generates the following string.
"(?=.*a)(?=.*b)(?=.*c)(?=.*d)(?=.*e)(?=.*f)(?=.*\s)(?=.*g)(?=.*h)(?=.*i)(?
=.*j)^...........$" 
//test run returns true
/(?=.*a)(?=.*b)(?=.*c)(?=.*d)(?=.*e)(?=.*f)(?=.*\s)(?=.*g)(?=.*h)(?=.*i)(?
=.*j)^...........$/.test('acdbefghij ')
//or using the RegExp object
//this returns true
new RegExp(anagramRegexGenerator('abcdef ghij')).test('acdbefghij ') 
//this returns false
new RegExp(anagramRegexGenerator('abcdef ghij')).test('acdbefghijj') 
like image 97
Narendra Yadala Avatar answered Sep 30 '22 19:09

Narendra Yadala