Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unanchored substring searching: index vs regex?

Tags:

regex

perl

I am writing some Perl scripts where I need to do a lot of string matching. For example:

my $str1 = "this is a test string";
my $str2 = "test";

To see if $str1 contains $str2 - I found that there are 2 approaches:

Approach 1: use Index function:

if ( index($str1, $str2) != -1 ) { .... }

Approach 2: use regular expression:

if( $str1 =~ /$str2/ ) { .... }

Which is better? and when should we use each of these over the other?

like image 868
Kedar Joshi Avatar asked Jun 09 '15 23:06

Kedar Joshi


2 Answers

Here is the result of Benchmark:

use Benchmark qw(:all) ;
my $count = -1;
my $str1 = "this is a test string";
my $str2 = "test";
my $str3 = qr/test/;

cmpthese($count, {
    'type1' => sub { if ( index($str1, $str2) != -1 ) { 1 } },
    'type2' => sub { if( $str1 =~ $str3 ) { 1 } },
});

Result (when a match happens):

           Rate type2 type1
type2 1747627/s    --  -70%
type1 5770465/s  230%    --

To be able to draw a conclusion, test not to match:

my $str2 = "text";
my $str3 = qr/text/;

Result (when a match does not happen):

           Rate type2 type1
type2 1857295/s    --  -67%
type1 5560630/s  199%    --

Conclusion:

The index function is much faster than the regexp match.

like image 162
Toto Avatar answered Nov 18 '22 22:11

Toto


When I see code that uses index, I usually see an index within an index within an index, etc. There's also more branching too: "if found, look for this; otherwise since not found, look for that." Almost always a single regex would have worked. So, for me, I almost always use a regex unless there's some specific reason I want to use an index.

Unfortunately, most programmers I run into don't read regex well and so for maintainability, the index method should be used more than I do.

like image 21
kjpires Avatar answered Nov 18 '22 21:11

kjpires