I am trying to match (and remove) any of 4000 expressions.
If I put the most common matches at the front will that speed matching (or is it undefined)
although typically exclusive, I sometimes have default cases: 'ax*|a(0-9)|', ie 'a', but I want a greedy match if possible. is it sufficient to reorder 'a(0-9)|ax*' or is this not guaranteed by the specification?
Does ordering of alternatives matter for speed/choosing between alternatives?
Yes, it does. Alternative groups are analyzed from left to right, and that happens at each position in the input string.
Thus, putting the most common matches at the start is already a boost.
When speaking about unanchored alternation lists in NFA regex (as in Python), it is important that alternatives that can match at the same location should be ordered in such a way that the longest comes first because otherwise a shorter alternative will always "win", and you may end up with xxxone when matching with some|someone -> xxx wanting to get xxx from someone.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With