Perl phone-number regex

Question

Sorry for asking such a simple question, I'm still an inexperienced programmer. I stumbled across a phone-number-matching regex in some old perl code at work, I'd love it if somebody could explain exactly what it means (my regex skills are severely lacking).

if ($value !~ /^\+[[:space:]]*[0-9][0-9.[:space:]-]*($[0-9.[:space:]-]*[0-9][0-9.[:space:]-]*$)?([0-9.[:space:]-]*[0-9][0-9.[:space:]-]*)?([[:space:]]+ext.[0-9.[:space:]-]*[0-9][0-9.[:space:]-]*)?$/i) {
    ...
}

Thank you in advance :)

Schwern · Accepted Answer

The code roughly says "you should replace this with Number::Phone".

All joking and good advice aside, first thing to do when figuring out a regex is to expand it with /x. First pass is to break things up by capture group.

/^
 \+[[:space:]]*[0-9][0-9.[:space:]-]*
 ($[0-9.[:space:]-]*[0-9][0-9.[:space:]-]*$)?
 ([0-9.[:space:]-]*[0-9][0-9.[:space:]-]*)?
 ([[:space:]]+ext.[0-9.[:space:]-]*[0-9][0-9.[:space:]-]*)?
$/xi

Then, since this is dominated by character sets, I'd space by character sets.

/^
 \+ [[:space:]]* [0-9] [0-9.[:space:]-]*
 ( $ [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* $ )?
 ( [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* )?
 ( [[:space:]]+ ext . [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* )?
$/xi

Now you can start to see some similar elements. Try lining those up to see the similarities.

/^
 \+        [[:space:]]* [0-9] [0-9.[:space:]-]*
 ( $ [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* $ )?
 (    [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]*    )?
 ( [[:space:]]+ 
   ext . 
      [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* 
 )?
$/xi

Then zero in on an element and try figure it out. This is the important one, [0-9.[:space:]-]* meaning "Zero or more numbers, spaces, dashes or dots". That doesn't make much sense for phone parsing, maybe it will make more sense in context. Let's look at a line we can guess what it's trying to do.

( $ [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* $ )?

Open paren.
Zero or more numbers, spaces, dashes or dots.
A number
Zero or more numbers, spaces, dashes or dots.
Close paren.

The parens suggest this is trying to parse an area code. The rest limits it to any number of numbers, spaces, dashes or dots, but the [0-9] ensures there is at least one number. This is likely the author's way of dealing with the multitude of phone number formats.

Let's give this a name, call it phone_chars, because it's what the author has decided phone numbers are made of. There's another element, the [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* which I'll call a "phone atom" because it's what the author decided an atom of a phone number can be. If we put that in its own regex and build the phone regex with it, things become a lot clearer.

my $phone_chars = qr{[0-9.[:space:]-]};
my $phone_atom  = qr{$phone_chars* [0-9] $phone_chars*}x;

/^
 \+ [[:space:]]* [0-9] $phone_chars*
 ( $ $phone_atom $ )?
 (    $phone_atom    )?
 ( [[:space:]]+ ext . $phone_atom )?
$/xi;

If you know something about phone numbers, it's like this:

Mandatory country code (which must start with a + and a number)
Optional area code
Optional phone number
Optional extension

This regex doesn't do a very good job validating phone numbers. According to this regex, "+1" is a valid phone number, but "(555) 123-4567" isn't because it doesn't have a country code.

Phone number validation is hard. Did I mention Number::Phone?

use strict;
use warnings;
use v5.10;

use Number::Phone;

my $number = Number::Phone->new("+1(555)456-2398");
say $number->is_valid;

Marty · Answer

Amazing what extended mode, a little whitespace and a few comments can do ...

if ($value !~  /
      ^                 # Anchor to start of string

     \+                 # followed (immediately) by literal '+'
     [[:space:]]*       # zero or more chars in the POSIX character class 'space'
     [0-9]              # compolsory digit
     [0-9.[:space:]-]*  # zero or more digit, full-stop, space or hyphen

     (                  # start capture to $1
         $                   # Literal open parentheses
         [0-9.[:space:]-]*    # zero or more ... (as above)
         [0-9]                # compolsory digit
         [0-9.[:space:]-]*    # zero or more ... (as above)
         $                   # Literal close parentheses
     )?                 # close capture to $1 - whole thing optional

     (                  # start capture to $2
         [0-9.[:space:]-]*    # zero or more ... (as above)
         [0-9]                # compolsory digit
         [0-9.[:space:]-]*    # zero or more ... (as above)
     )?                 # close capture to $2 - whole thing optional

     (                  # start capture to $3
         [[:space:]]+         # at least one space (as definned by POSIX)
         ext.                 # literal 'ext' followed by any character
         [0-9.[:space:]-]*    # zero or more ... (as above)
         [0-9]                # compolsory digit
         [0-9.[:space:]-]*    # zero or more ... (as above)
     )?                 # close capture to $3 - whole thing optional

      $                 # Anchor to end of string
              /ix       # close regex; ignore case, extended mode options
   )  {

Perl phone-number regex

Tags:

regex

perl

Jordan

2 Answers

Schwern

Marty

Recent Activity

Donate For Us

Perl phone-number regex

Tags:

regex

perl

Jordan

2 Answers

Schwern

Marty

Related questions

Recent Activity

Donate For Us