Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find minimum, maximum length strings generated given a regular expression? [closed]

How to find minimum and maximum length given a regular expression ?

For example

[1-9]?[0-9]

This regular expression can generate a minimum 1 (0 or 1 0r 2.... or 9) and a maximum of string length 2 (10 or 11 or 12 or ......19 or 20 or 21...........or 99)

Similarly, can anyone provide a function which can calculate minimum and maximum length given a regular expression? Which can take below regex as input?

^[a-zA-Z0-9][a-zA-Z0-9.-]{0,64}[a-zA-Z0-9]$
^[a-zA-Z0-9._-]{1,255}$
^[a-zA-Z0-9 !#$'()*+,./:;=?@\\^_`~-]{1,30}$
^[]a-zA-Z0-9 !#$'()*+,./:;=?@[^_`{|}~-]{0,50}$
^((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])$
like image 852
user3362766 Avatar asked Feb 27 '14 23:02

user3362766


People also ask

How do I limit the length of a string in regex?

By combining the interval quantifier with the surrounding start- and end-of-string anchors, the regex will fail to match if the subject text's length falls outside the desired range.

What does * do in regex?

The Match-zero-or-more Operator ( * ) This operator repeats the smallest possible preceding regular expression as many times as necessary (including zero) to match the pattern. `*' represents this operator. For example, `o*' matches any string made up of zero or more `o' s.


2 Answers

Regular expressions consist of just a very small set of elements.

  1. atoms (e. g. a or [a-k] or .),
  2. choices (e. g. r1|r2),
  3. repetitions (e. g. r{3,10}, r+, r*, r?).
  4. groups (e. g. (r)) which can be subject to repetitions or choices.
  5. Specials (e. g. ^, $).

That's more or less it unless we want to add non-consuming look-aheads and similar, but they are not part of your example input, so I will not consider these.

How long (minimum/maximum) can these be?

  1. 1 / 1 (atoms are constant in size)
  2. min(minlen(r) for r in choices) / max(maxlen(r) for r in choices)
  3. minlen(r) * minrepretition / maxlen(r) * maxrepetition
  4. minlen(r) / maxlen(r)
  5. 0 (positional parameters match the empty string).

So, what you will need is a regexp parser (as Hugh Bothwell suggested it in his answer) which returns you sth like an abstract syntax tree (absy) of a given regexp; this absy then can be analyzed using the rules I sketched above to find the minimal or maximal length for a string the given regexp can match.

like image 160
Alfe Avatar answered Sep 28 '22 17:09

Alfe


There is some starting code at http://pyparsing.wikispaces.com/file/view/invRegex.py for a regex parser in pyparsing; it shouldn't be hard to modify to do what you want.

Some tutorials can be found at http://pyparsing.wikispaces.com/Examples

like image 22
Hugh Bothwell Avatar answered Sep 28 '22 19:09

Hugh Bothwell