(Note: Title doesn't seem to clear -- if someone can rephrase this I'm all for it!)
Given this regex: (.*_e\.txt)
, which matches some filenames, I need to add some other single character suffixes in addition to the e
. Should I choose a character class or should I use an alternation for this? (Or does it really matter??)
That is, which of the following two seems "better", and why:
a) (.*(e|f|x)\.txt)
, or
b) (.*[efx]\.txt)
Use square brackets [] to match any characters in a set. Use \w to match any single alphanumeric character: 0-9 , a-z , A-Z , and _ (underscore). Use \d to match any single digit. Use \s to match any single whitespace character.
The alternation operator has the lowest precedence of all regex operators. That is, it tells the regex engine to match either everything to the left of the vertical bar, or everything to the right of the vertical bar. If you want to limit the reach of the alternation, you need to use parentheses for grouping.
Please note that although there are multiple characters in the set, they correspond to exactly one character in the match.
The character + in a regular expression means "match the preceding character one or more times". For example A+ matches one or more of character A. The plus character, used in a regular expression, is called a Kleene plus .
Use [efx]
- that's exactly what character classes are designed for: to match one of the included characters. Therefore it's also the most readable and shortest solution.
I don't know if it's faster, but I would be very much surprised if it wasn't. It definitely won't be slower.
My reasoning (without ever having written a regex engine, so this is pure conjecture):
The regex token [abc]
will be applied in a single step of the regex engine: "Is the next character one of a
, b
, or c
?"
(a|b|c)
however tells the regex engine to
a
. If so, success. If not:b
. If so, success. If not:c
. If so, success. If not:If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With