Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl phone-number regex

Tags:

regex

perl

Sorry for asking such a simple question, I'm still an inexperienced programmer. I stumbled across a phone-number-matching regex in some old perl code at work, I'd love it if somebody could explain exactly what it means (my regex skills are severely lacking).

if ($value !~ /^\+[[:space:]]*[0-9][0-9.[:space:]-]*(\([0-9.[:space:]-]*[0-9][0-9.[:space:]-]*\))?([0-9.[:space:]-]*[0-9][0-9.[:space:]-]*)?([[:space:]]+ext.[0-9.[:space:]-]*[0-9][0-9.[:space:]-]*)?$/i) {
    ...
}

Thank you in advance :)

like image 892
Jordan Avatar asked Dec 06 '22 18:12

Jordan


2 Answers

The code roughly says "you should replace this with Number::Phone".

All joking and good advice aside, first thing to do when figuring out a regex is to expand it with /x. First pass is to break things up by capture group.

/^
 \+[[:space:]]*[0-9][0-9.[:space:]-]*
 (\([0-9.[:space:]-]*[0-9][0-9.[:space:]-]*\))?
 ([0-9.[:space:]-]*[0-9][0-9.[:space:]-]*)?
 ([[:space:]]+ext.[0-9.[:space:]-]*[0-9][0-9.[:space:]-]*)?
$/xi

Then, since this is dominated by character sets, I'd space by character sets.

/^
 \+ [[:space:]]* [0-9] [0-9.[:space:]-]*
 ( \( [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* \) )?
 ( [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* )?
 ( [[:space:]]+ ext . [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* )?
$/xi

Now you can start to see some similar elements. Try lining those up to see the similarities.

/^
 \+        [[:space:]]* [0-9] [0-9.[:space:]-]*
 ( \( [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* \) )?
 (    [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]*    )?
 ( [[:space:]]+ 
   ext . 
      [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* 
 )?
$/xi

Then zero in on an element and try figure it out. This is the important one, [0-9.[:space:]-]* meaning "Zero or more numbers, spaces, dashes or dots". That doesn't make much sense for phone parsing, maybe it will make more sense in context. Let's look at a line we can guess what it's trying to do.

( \( [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* \) )?
  • Open paren.
  • Zero or more numbers, spaces, dashes or dots.
  • A number
  • Zero or more numbers, spaces, dashes or dots.
  • Close paren.

The parens suggest this is trying to parse an area code. The rest limits it to any number of numbers, spaces, dashes or dots, but the [0-9] ensures there is at least one number. This is likely the author's way of dealing with the multitude of phone number formats.

Let's give this a name, call it phone_chars, because it's what the author has decided phone numbers are made of. There's another element, the [0-9.[:space:]-]* [0-9] [0-9.[:space:]-]* which I'll call a "phone atom" because it's what the author decided an atom of a phone number can be. If we put that in its own regex and build the phone regex with it, things become a lot clearer.

my $phone_chars = qr{[0-9.[:space:]-]};
my $phone_atom  = qr{$phone_chars* [0-9] $phone_chars*}x;

/^
 \+ [[:space:]]* [0-9] $phone_chars*
 ( \( $phone_atom \) )?
 (    $phone_atom    )?
 ( [[:space:]]+ ext . $phone_atom )?
$/xi;

If you know something about phone numbers, it's like this:

  1. Mandatory country code (which must start with a + and a number)
  2. Optional area code
  3. Optional phone number
  4. Optional extension

This regex doesn't do a very good job validating phone numbers. According to this regex, "+1" is a valid phone number, but "(555) 123-4567" isn't because it doesn't have a country code.

Phone number validation is hard. Did I mention Number::Phone?

use strict;
use warnings;
use v5.10;

use Number::Phone;

my $number = Number::Phone->new("+1(555)456-2398");
say $number->is_valid;
like image 148
Schwern Avatar answered Dec 23 '22 22:12

Schwern


Amazing what extended mode, a little whitespace and a few comments can do ...

if ($value !~  /
      ^                 # Anchor to start of string

     \+                 # followed (immediately) by literal '+'
     [[:space:]]*       # zero or more chars in the POSIX character class 'space'
     [0-9]              # compolsory digit
     [0-9.[:space:]-]*  # zero or more digit, full-stop, space or hyphen

     (                  # start capture to $1
         \(                   # Literal open parentheses
         [0-9.[:space:]-]*    # zero or more ... (as above)
         [0-9]                # compolsory digit
         [0-9.[:space:]-]*    # zero or more ... (as above)
         \)                   # Literal close parentheses
     )?                 # close capture to $1 - whole thing optional

     (                  # start capture to $2
         [0-9.[:space:]-]*    # zero or more ... (as above)
         [0-9]                # compolsory digit
         [0-9.[:space:]-]*    # zero or more ... (as above)
     )?                 # close capture to $2 - whole thing optional

     (                  # start capture to $3
         [[:space:]]+         # at least one space (as definned by POSIX)
         ext.                 # literal 'ext' followed by any character
         [0-9.[:space:]-]*    # zero or more ... (as above)
         [0-9]                # compolsory digit
         [0-9.[:space:]-]*    # zero or more ... (as above)
     )?                 # close capture to $3 - whole thing optional

      $                 # Anchor to end of string
              /ix       # close regex; ignore case, extended mode options
   )  {
like image 41
Marty Avatar answered Dec 23 '22 21:12

Marty