Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python regular expressions does ordering of alternatives matter for speed/choosing between alternatives

Tags:

python

regex

I am trying to match (and remove) any of 4000 expressions.

If I put the most common matches at the front will that speed matching (or is it undefined)

although typically exclusive, I sometimes have default cases: 'ax*|a(0-9)|', ie 'a', but I want a greedy match if possible. is it sufficient to reorder 'a(0-9)|ax*' or is this not guaranteed by the specification?

like image 554
seanv507 Avatar asked Jan 20 '26 12:01

seanv507


1 Answers

Does ordering of alternatives matter for speed/choosing between alternatives?

Yes, it does. Alternative groups are analyzed from left to right, and that happens at each position in the input string.

Thus, putting the most common matches at the start is already a boost.

When speaking about unanchored alternation lists in NFA regex (as in Python), it is important that alternatives that can match at the same location should be ordered in such a way that the longest comes first because otherwise a shorter alternative will always "win", and you may end up with xxxone when matching with some|someone -> xxx wanting to get xxx from someone.

like image 135
Wiktor Stribiżew Avatar answered Jan 23 '26 03:01

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!