I was performing a code review for a colleague and he had a regular expression that looked like this:
if ($value =~ /^\d\d\d\d$/) {
#do stuff
}
I told him he should change it to:
if ($value =~ /^\d{4}$/) {
#do stuff
}
To which he replied that he preferred the first for readability (I find the second more readable, but that's a religious debate I'll save for another day).
My question: is there an actual benefit to one over the other?
quantifier matches the preceding element zero or more times but as few times as possible. It's the lazy counterpart of the greedy quantifier * . In the following example, the regular expression \b\w*? oo\w*?\
The Special Character Classes in Perl are as follows: Digit \d[0-9]: The \d is used to match any digit character and its equivalent to [0-9]. In the regex /\d/ will match a single digit. The \d is standardized to “digit”.
$1 equals the text " brown ".
In addition, Perl defines the following: \w Match a "word" character (alphanumeric plus "_") \W Match a non-word character \s Match a whitespace character \S Match a non-whitespace character \d Match a digit character \D Match a non-digit character.
There's no such thing as absolute readability. There's what people can individually recognize, which is why people often understand their code while nobody else can. If he never uses quantifiers, he's always going to think quantifiers are hard to read because he never learns to grok them.
I most often find that people say "more readable" when they really mean "that's what I know already" or "that's what I wrote the first time". That's not necessarily the case here, though.
An absolute quantifier like {4}
is just easier to specify and communicate to other programmers. Who wants to count the number of \d
s by hand? You write code for other people to read, so don't make their life harder.
However, you might have missed the bug in that code because you were focused on the quantifier issue. The $
anchor allows a newline at the end of the string, and if a Perl Best Practices zealot comes along and blindly adds /xsm
to all regexes (a painful experience I've seen more than a few times), that $
allows even more invalid output. You probably want the \z
absolute end-of-string anchor instead.
Not that it happened in your case, but code reviews tend to turn into style or syntax reviews (because those are easier to notice) and actually miss the point of checking for proper and intended behavior and correct design. Often the style problems aren't worth worrying about considering all of the other ways you could spend time to improve code. :)
They do the exact same thing, so as far as practicality it's a matter of preference. Is there a tiny performance difference one way or the other? Who knows but it's surely insignificant.
The quantifiers are more useful (and required) when the pattern length isn't fixed, for example \d{12,16}
, \d{2,}
, etc.
I prefer \d{4}
which is easier for my brain to parse than \d\d\d\d
Also what if you're matching a character class rather than a simple digit? [aeiouy0-9]{4}
or [aeiouy0-9][aeiouy0-9][aeiouy0-9][aeiouy0-9]
?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With