What is the easiest way to determine the maximum match length of a regular expression?
Specifically, I am using Python's re
module.
E.g. for foo((bar){2,3}|potato)
it would be 12.
Obviously, regexes using operators like *
and +
have theoretically unbounded match lengths; in those cases returning an error or something is fine. Giving an error for regexes using the (?...)
extensions is also fine.
I would also be ok with getting an approximate upper bound, as long as it is always greater than the actual maximum length, but not too much greater.
Using pyparsing's invRegex module:
import invRegex
data='foo(bar{2,3}|potato)'
print(list(invRegex.invert(data)))
# ['foobarr', 'foobarrr', 'foopotato']
print(max(map(len,invRegex.invert(data))))
# 9
Another alternative is to use ipermute
from this module.
import inverse_regex
data='foo(bar{2,3}|potato)'
print(list(inverse_regex.ipermute(data)))
# ['foobarr', 'foobarrr', 'foopotato']
print(max(map(len,inverse_regex.ipermute(data))))
# 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With