Sorry for asking such a simple question, I'm still an inexperienced programmer. I stumbled across a phone-number-matching regex in some old perl code at work, I'd love it if somebody could explain exactly what it means (my regex skills are severely lacking).
if ($value !~ /^\+[[:space:]]*[0-9][0-9.[:space:]-]*(\([0-9.[:space:]-]*[0-9][0-9.[:space:]-]*\))?([0-9.[:space:]-]*[0-9][0-9.[:space:]-]*)?([[:space:]]+ext.[0-9.[:space:]-]*[0-9][0-9.[:space:]-]*)?$/i) {
...
}
Thank you in advance :)
The code roughly says "you should replace this with Number::Phone".
All joking and good advice aside, first thing to do when figuring out a regex is to expand it with /x
. First pass is to break things up by capture group.
/^
\+[[:space:]]*[0-9][0-9.[:space:]-]*
(\([0-9.[:space:]-]*[0-9][0-9.[:space:]-]*\))?
([0-9.[:space:]-]*[0-9][0-9.[:space:]-]*)?
([[:space:]]+ext.[0-9.[:space:]-]*[0-9][0-9.[:space:]-]*)?
$/xi
Then, since this is dominated by character sets, I'd space by character sets.
/^
\+ [[:space:]]* [0-9] [0-9.[:space:]-]*
( \( [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* \) )?
( [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* )?
( [[:space:]]+ ext . [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* )?
$/xi
Now you can start to see some similar elements. Try lining those up to see the similarities.
/^
\+ [[:space:]]* [0-9] [0-9.[:space:]-]*
( \( [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* \) )?
( [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* )?
( [[:space:]]+
ext .
[0-9.[:space:]-]* [0-9] [0-9.[:space:]-]*
)?
$/xi
Then zero in on an element and try figure it out. This is the important one, [0-9.[:space:]-]*
meaning "Zero or more numbers, spaces, dashes or dots". That doesn't make much sense for phone parsing, maybe it will make more sense in context. Let's look at a line we can guess what it's trying to do.
( \( [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* \) )?
The parens suggest this is trying to parse an area code. The rest limits it to any number of numbers, spaces, dashes or dots, but the [0-9]
ensures there is at least one number. This is likely the author's way of dealing with the multitude of phone number formats.
Let's give this a name, call it phone_chars
, because it's what the author has decided phone numbers are made of. There's another element, the [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]*
which I'll call a "phone atom" because it's what the author decided an atom of a phone number can be. If we put that in its own regex and build the phone regex with it, things become a lot clearer.
my $phone_chars = qr{[0-9.[:space:]-]};
my $phone_atom = qr{$phone_chars* [0-9] $phone_chars*}x;
/^
\+ [[:space:]]* [0-9] $phone_chars*
( \( $phone_atom \) )?
( $phone_atom )?
( [[:space:]]+ ext . $phone_atom )?
$/xi;
If you know something about phone numbers, it's like this:
This regex doesn't do a very good job validating phone numbers. According to this regex, "+1" is a valid phone number, but "(555) 123-4567" isn't because it doesn't have a country code.
Phone number validation is hard. Did I mention Number::Phone?
use strict;
use warnings;
use v5.10;
use Number::Phone;
my $number = Number::Phone->new("+1(555)456-2398");
say $number->is_valid;
Amazing what extended mode, a little whitespace and a few comments can do ...
if ($value !~ /
^ # Anchor to start of string
\+ # followed (immediately) by literal '+'
[[:space:]]* # zero or more chars in the POSIX character class 'space'
[0-9] # compolsory digit
[0-9.[:space:]-]* # zero or more digit, full-stop, space or hyphen
( # start capture to $1
\( # Literal open parentheses
[0-9.[:space:]-]* # zero or more ... (as above)
[0-9] # compolsory digit
[0-9.[:space:]-]* # zero or more ... (as above)
\) # Literal close parentheses
)? # close capture to $1 - whole thing optional
( # start capture to $2
[0-9.[:space:]-]* # zero or more ... (as above)
[0-9] # compolsory digit
[0-9.[:space:]-]* # zero or more ... (as above)
)? # close capture to $2 - whole thing optional
( # start capture to $3
[[:space:]]+ # at least one space (as definned by POSIX)
ext. # literal 'ext' followed by any character
[0-9.[:space:]-]* # zero or more ... (as above)
[0-9] # compolsory digit
[0-9.[:space:]-]* # zero or more ... (as above)
)? # close capture to $3 - whole thing optional
$ # Anchor to end of string
/ix # close regex; ignore case, extended mode options
) {
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With