Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx calculating number of distinct permutations

Tags:

regex

php

ruby

So this is a bit unusual use of RegEx; I want to calculate number (or indicate infinite if suitable) of distinct strings that would be matched by a specific pattern.

For example let's consider [a-zA-Z] which would yield 52, [a-zA-Z]{1,2} which would yield 2652 (52+52×52−52×2; subtracting 52×2 for strings like aa,MM which are not distinct) or [a-zA-Z]+ which would be ∞.

Of course I'd like this mechanism to be able to deal with more complex regular expressions than that. I'm particularly interested in solutions for PHP and Ruby. Is this even possible?

like image 802
Bart Platak Avatar asked Sep 14 '12 14:09

Bart Platak


1 Answers

Regular expressions are used to match a given string by comparing it to a given pattern. Any given regex can match a large number of strings, the longer the regex the more strings it can match.

In my opinion, what you are after cannot be done with regular expressions. You can write a program which deconstructs the regular expression and tries to guess the amount of strings you could match. That being said however, the construction of such program will most likely not be trivial.

For instance, in your case, [a-zA-Z] will not only match a through z (and the same for the upper case variant), but it will also match any string which contains those letters, which basically is any string you can ever imagine which contains at least one of those letters.

Adding the ^ and $ anchors might reduce the amount of hits, but then again, you will still have more than 48 since sometimes, you could also argue that {EmptyString}a{EmptyString} could also be matched by ^a$, which makes the amount of possible results quite huge.

like image 132
npinti Avatar answered Oct 11 '22 11:10

npinti