Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A regex that cannot match (for a generated expression)

Tags:

python

regex

In my code I generate a regular expression from a list of subexpressions. Joining expressions works fine if I put each of them in a non-matching group (?:…):

# concatenation:
joined_expr = ''.join('(?:{})'.format(expr) for expr in subexpression)
# disjunction:
joined_expr = '|'.join('(?:{})'.format(expr) for expr in subexpression)

Problem is: The result of this joined expression is a subexpression for a bigger expression, and subexpression could be empty, but the joined expression must not match the empty string.

So what would be the easiest why to make a regular expression, that cannot match? Would (?:(?!.).) work? If not, why not? Would Python's re engine understand my attempt to create a failing branch and optimize it?

like image 517
kay Avatar asked Mar 12 '26 22:03

kay


1 Answers

Spare the time elapsed by the regex engine using:

 \Zx # or '$s' to match a literal after the end of the string

It much more simpler than (?:(?!.).) for long strings and you obtain the same result.

Here is a short online test with a text of 4231 chars:

  • Test negative lookahead - (?:(?!.).) - 16924 steps

  • Test after end anchor - \Zx - 2 steps

like image 135
Giuseppe Ricupero Avatar answered Mar 15 '26 10:03

Giuseppe Ricupero



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!