Is there a better way to write Perl regexes with /x so the code is still easy to read?

Tags:

I ran Perl::Critic on one of my scripts, and got this message:

Regular expression without "/x" flag at line 21, column 26. See page 236 of PBP.

I looked up the policy information here, and I understand that writing regular expressions in extended mode will help anyone who is looking at the code.

However, I am stuck as how to convert my code to use the /x flag.

CPAN Example:

# Match a single-quoted string efficiently...

m{'[^\\']*(?:\\.[^\\']*)*'};  #Huh?

# Same thing with extended format...

m{
    '           # an opening single quote
    [^\\']      # any non-special chars (i.e. not backslash or single quote)
    (?:         # then all of...
        \\ .    #    any explicitly backslashed char
        [^\\']* #    followed by an non-special chars
    )*          # ...repeated zero or more times
    '           # a closing single quote
}x;

This makes sense if you only look at the regex.

My Code:

if ($line =~ /^\s*package\s+(\S+);/ ) {

I am not exactly sure how to use an extended regex inside of an if statement. I can write it like this:

    if (
        $line =~ /
        ^\s*    # starting with zero or more spaces
        package
        \s+     # at least one space
        (\S+)   # capture any non-space characters
        ;       # ending in a semi-colon
        /x
      )
    {

And this works, but I think this is almost harder to read than the original. Is there a better way (or a best practice way) to write this? I guess I could create a variable using qr//.

I'm not really looking for advice on re-writing this specific regex (although if I can improve it, I'll take advice) - I'm more looking for advice on how to expand a regex inside of an if statement.

I know Perl::Critic is just a guideline, but it would be nice to follow it.

Thanks in advance!

EDIT: So after receiving a few answers, it became clear to me that making a regex multi-line with comments is not always necessary. People who understand basic regex should be able to understand what my example was doing - the comments I added were maybe a little unnecessary and verbose. I like the idea of using the extended regex flag, but still embedding spaces in the regex to make each part of the regex a little more clear. Thanks for all the input!

448

asked Jun 12 '09 15:06

BrianH

1 Answers

Never write a comment that says what the code says. Comments should tell you why the code says what it says. Take a look at this monstrosity, without the comments it is very difficult to see what is going on, but the comments make it clear what is trying to be matched:

require 5.010;
my $sep         = qr{ [/.-] }x;               #allowed separators    
my $any_century = qr/ 1[6-9] | [2-9][0-9] /x; #match the century 
my $any_decade  = qr/ [0-9]{2} /x;            #match any decade or 2 digit year
my $any_year    = qr/ $any_century? $any_decade /x; #match a 2 or 4 digit year

#match the 1st through 28th for any month of any year
my $start_of_month = qr/
    (?:                         #match
        0?[1-9] |               #Jan - Sep or
        1[0-2]                  #Oct - Dec
    )
    ($sep)                      #the separator
    (?: 
        0?[1-9] |               # 1st -  9th or
        1[0-9]  |               #10th - 19th or
        2[0-8]                  #20th - 28th
    )
    \g{-1}                      #and the separator again
/x;

#match 28th - 31st for any month but Feb for any year
my $end_of_month = qr/
    (?:
        (?: 0?[13578] | 1[02] ) #match Jan, Mar, May, Jul, Aug, Oct, Dec
        ($sep)                  #the separator
        31                      #the 31st
        \g{-1}                  #and the separator again
        |                       #or
        (?: 0?[13-9] | 1[0-2] ) #match all months but Feb
        ($sep)                  #the separator
        (?:29|30)               #the 29th or the 30th
        \g{-1}                  #and the separator again
    )
/x;

#match any non-leap year date and the first part of Feb in leap years
my $non_leap_year = qr/ (?: $start_of_month | $end_of_month ) $any_year/x;

#match 29th of Feb in leap years
#BUG: 00 is treated as a non leap year
#even though 2000, 2400, etc are leap years
my $feb_in_leap = qr/
    0?2                         #match Feb
    ($sep)                      #the separtor
    29                          #the 29th
    \g{-1}                      #the separator again
    (?:
        $any_century?           #any century
        (?:                     #and decades divisible by 4 but not 100
            0[48]       | 
            [2468][048] |
            [13579][26]
        )
        |
        (?:                     #or match centuries that are divisible by 4
            16          | 
            [2468][048] |
            [3579][26]
        )
        00                      
    )
/x;

my $any_date  = qr/$non_leap_year|$feb_in_leap/;
my $only_date = qr/^$any_date$/;

197

answered Sep 25 '22 23:09

Chas. Owens

Related questions
                            
                                C# remove parenthesis from string
                            
                                Remove parentheses, dashes, and spaces from phone number
                            
                                regular expression: extract last 2 characters
                            
                                Regular Expression to detect yyyy-MM-dd
                            
                                Explaining password regex component by component (javascript) [closed]
                            
                                Difference between .split(/\s+/) and .split(" ")?
                            
                                Swift: Validate Username Input
                            
                                Find matching guid in string [duplicate]
                            
                                Need to find text with RegEx and BeautifulSoup
                            
                                How to read content from two files and merge into a 3rd file in bash shell
                            
                                RegEx for allowing alphanumeric at the starting and hyphen thereafter
                            
                                Ruby, gsub and regex
                            
                                Is there a regular expression to remove a trailing slash in Perl?
                            
                                Regex to split on successions of newline characters
                            
                                Find files with illegal windows characters in the name on Linux
                            
                                How to extract an optional query parameter using regex in Javascript
                            
                                Why ^*$ matches "127.0.0.1"
                            
                                How to replace tokens in a string without StringTokenizer
                            
                                What regex can I use to match any valid IP-address represented in dot-decimal notation?
                            
                                Regex to modify Google Drive shared file URL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a better way to write Perl regexes with /x so the code is still easy to read?

Tags:

regex

perl

perl-critic

BrianH

People also ask

1 Answers

Chas. Owens

Recent Activity

Donate For Us