Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is using a regex pre-compiled with qr slower than using a constant regex?

Tags:

perl

I just saw this question about optimizing a particular regular expression in Perl. I wondered about how many matches my machine could do, so I tried the following simple benchmark:

  • case 1 - using a regex pre-compiled with qr
  • case 2 - plain /regex/ match
use 5.014;
use warnings;

use Benchmark qw(:all);

my $str = "SDZ";
my $qr = qr/S?T?K?P?W?H?R?A?O?\*?E?U?F?R?P?B?L?G?T?S?D?Z?/;

say "match [$&]" if( $str =~ $qr );

my $res = timethese(-10, {
    stdrx => sub { $str =~ /S?T?K?P?W?H?R?A?O?\*?E?U?F?R?P?B?L?G?T?S?D?Z?/ },
    qr_rx => sub { $str =~ $qr },
});

cmpthese $res;

To my surprise, it gave the following result:

match [SDZ]
Benchmark: running qr_rx, stdrx for at least 10 CPU seconds...
     qr_rx: 10 wallclock secs ( 9.99 usr +  0.01 sys = 10.00 CPU) @ 1089794.90/s (n=10897949)
     stdrx: 11 wallclock secs (10.58 usr +  0.04 sys = 10.62 CPU) @ 1651340.11/s (n=17537232)
           Rate qr_rx stdrx
qr_rx 1089795/s    --  -34%
stdrx 1651340/s   52%    --

i.e. the plain $str =~ /regex/ is about 50% faster than using $str =~ qr. I expected the opposite result.

Am I doing something wrong? Why am I getting this result?

EDIT:

Just downloaded the cited book, I have much to learn :). But, the cited book also says:

If a regex literal has no variable interpolation, Perl knows that the regex can’t change from use to use, so after the regex is compiled once, that compiled form is saved (“cached”) for use whenever execution again reaches the same code. The regex is examined and compiled just once, no matter how often it’s used during the program’s execution.

So, in the above both regexes are literal without variable interpolation. So, the "precompiled" regex should be same fast as the plain one. In the example, it is slower by 50%.

Ikegami explained why the $str =~ $qr is slower. (and honestly the "slower" isn't the right term, because we talking about few microseconds... :))

BUT the perl docs says:

Precompilation of the pattern into an internal representation at the moment of qr() avoids the need to recompile the pattern every time a match /$pat/ is attempted.

From the point of view of an ordinary perl user ("not some high level perl monk"), this means: precompile your pattern - it will be faster, but the truth is - it helps only if the regex contains some "non-static" parts...

Honestly, me still not understand this fully - but got a book and going to learn. :) Maybe one sentence more in the docs - could help beginners do not misunderstand the qr when they starting to learn.

Thank you all!

like image 533
cajwine Avatar asked Mar 24 '17 22:03

cajwine


People also ask

Is compiled regex faster?

compiled regex's being slower than interpreted ones. There is a lot of work that went into making compiled regex's performant. To get a good comparison between Compiled and not compiled you should test the performance of a single compiled Regex and single non-compiled Regex matching N times.

What is a compiled regex?

compile() method is used to compile a regular expression pattern provided as a string into a regex pattern object ( re. Pattern ). Later we can use this pattern object to search for a match inside different target strings using regex methods such as a re.


1 Answers

Regex patterns are compiled at compile-time if they don't interpolate. Neither the regex in the qr// operator nor the one in match operator in stdrx interpolate, so both are compiled at compile-time.

The extra 30μs spent in the qr_rx test is spent "compiling" the third regex: The one in the match operator in qr_rx. Don't forget that $_ =~ $re is short for $_ =~ m/$re/. Now, no compilation actually occurs when the whole pattern consists of an interpolated pre-compiled regex because that case is handled specially, but it apparently still takes a bit of time to coax the match op into using the pre-compiled regex. (Maybe it needs to clone it?)

like image 67
ikegami Avatar answered Sep 23 '22 02:09

ikegami