I am writing some Perl scripts where I need to do a lot of string matching. For example:
my $str1 = "this is a test string";
my $str2 = "test";
To see if $str1 contains $str2 - I found that there are 2 approaches:
Approach 1: use Index function:
if ( index($str1, $str2) != -1 ) { .... }
Approach 2: use regular expression:
if( $str1 =~ /$str2/ ) { .... }
Which is better? and when should we use each of these over the other?
Here is the result of Benchmark:
use Benchmark qw(:all) ;
my $count = -1;
my $str1 = "this is a test string";
my $str2 = "test";
my $str3 = qr/test/;
cmpthese($count, {
'type1' => sub { if ( index($str1, $str2) != -1 ) { 1 } },
'type2' => sub { if( $str1 =~ $str3 ) { 1 } },
});
Result (when a match happens):
Rate type2 type1
type2 1747627/s -- -70%
type1 5770465/s 230% --
To be able to draw a conclusion, test not to match:
my $str2 = "text";
my $str3 = qr/text/;
Result (when a match does not happen):
Rate type2 type1
type2 1857295/s -- -67%
type1 5560630/s 199% --
Conclusion:
The index
function is much faster than the regexp match.
When I see code that uses index
, I usually see an index
within an index
within an index
, etc. There's also more branching too: "if found, look for this; otherwise since not found, look for that." Almost always a single regex would have worked. So, for me, I almost always use a regex unless there's some specific reason I want to use an index
.
Unfortunately, most programmers I run into don't read regex well and so for maintainability, the index
method should be used more than I do.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With