Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Maximum match length of a regular expression

Tags:

python

regex

What is the easiest way to determine the maximum match length of a regular expression?

Specifically, I am using Python's re module.

E.g. for foo((bar){2,3}|potato) it would be 12.

Obviously, regexes using operators like * and + have theoretically unbounded match lengths; in those cases returning an error or something is fine. Giving an error for regexes using the (?...) extensions is also fine.

I would also be ok with getting an approximate upper bound, as long as it is always greater than the actual maximum length, but not too much greater.

like image 663
adw Avatar asked Oct 31 '10 14:10

adw


1 Answers

Using pyparsing's invRegex module:

import invRegex
data='foo(bar{2,3}|potato)'    
print(list(invRegex.invert(data)))
# ['foobarr', 'foobarrr', 'foopotato']    
print(max(map(len,invRegex.invert(data))))
# 9

Another alternative is to use ipermute from this module.

import inverse_regex
data='foo(bar{2,3}|potato)'
print(list(inverse_regex.ipermute(data)))
# ['foobarr', 'foobarrr', 'foopotato']
print(max(map(len,inverse_regex.ipermute(data))))
# 9
like image 63
unutbu Avatar answered Sep 27 '22 20:09

unutbu