Given two string variables <code>$string</code> and <code>$needle</code> in <code>perl</code>, what's the most efficient way to check whether <code>$string</code> starts with <code>$needle</code>. <ul> <li> <code>$string =~ /^\Q$needle\E/</code> is the closest match I could think of that does what is required but is the least efficient (by far) of the solutions I tried.</li> <li> <code>index($string, $needle) == 0</code> works and is relatively efficient for some values of <code>$string</code> and <code>$needle</code> but needlessly searches for the needle in other positions (if not found at the start).</li> <li> <code>substr($string, 0, length($needle)) eq $needle</code> should be quite simple and efficient, but in most of my few tests is not more efficient than the previous one.</li> </ul> Is there a canonical way to do that in <code>perl</code> which I wouldn't be aware of or any way to optimise any of the above solutions? (in my particular use case, <code>$string</code> and <code>$needle</code> are going to be different in each run, so precompiling a regexp is not an option). <hr> Example of how to measure the performance of a given solution (here from a POSIX <code>sh</code>): <pre class="prettyprint"><code>string='somewhat not so longish string' needle='somew' time perl -e ' ($n,$string,$needle) = @ARGV; for ($i=0;$i<$n;$i++) { index($string, $needle) == 0 }' 10000000 "$string" "$needle" </code></pre> With those values, <code>index()</code> performs better than <code>substr()+eq</code> with this system with perl 5.14.2, but with: <pre class="prettyprint"><code>string="aaaaabaaaaabaaaaabaaaaabaaaaabaaaaab" needle="aaaaaa" </code></pre> That's reversed.

<pre class="prettyprint"><code>rindex $string, $substring, 0 </code></pre> searches for <code>$substring</code> in <code>$string</code> at position <=0 which is only possible if <code>$substring</code> is a prefix of <code>$string</code>. Example: <pre class="prettyprint"><code>> rindex "abc", "a", 0 0 > rindex "abc", "b", 0 -1 </code></pre>

Most efficient way to check if $string starts with $needle in perl

Given two string variables $string and $needle in perl, what's the most efficient way to check whether $string starts with $needle.

$string =~ /^\Q$needle\E/ is the closest match I could think of that does what is required but is the least efficient (by far) of the solutions I tried.
index($string, $needle) == 0 works and is relatively efficient for some values of $string and $needle but needlessly searches for the needle in other positions (if not found at the start).
substr($string, 0, length($needle)) eq $needle should be quite simple and efficient, but in most of my few tests is not more efficient than the previous one.

Is there a canonical way to do that in perl which I wouldn't be aware of or any way to optimise any of the above solutions?

(in my particular use case, $string and $needle are going to be different in each run, so precompiling a regexp is not an option).

Example of how to measure the performance of a given solution (here from a POSIX sh):

string='somewhat not so longish string' needle='somew' time perl -e '   ($n,$string,$needle) = @ARGV;   for ($i=0;$i<$n;$i++) {      index($string, $needle) == 0    }' 10000000 "$string" "$needle"

With those values, index() performs better than substr()+eq with this system with perl 5.14.2, but with:

string="aaaaabaaaaabaaaaabaaaaabaaaaabaaaaab" needle="aaaaaa"

That's reversed.

What does \s+ mean in Perl?

(\S+) | will match and capture any number (one or more) of non-space characters, followed by a space character (assuming the regular expression isn't modified with a /x flag). In both cases, these constructs appear to be one component of an alternation. Breaking it down: ( .... ) : Group and capture.

What is Vstring in Perl?

A string in Perl is a scalar variable and start with a ($) sign and it can contain alphabets, numbers, special characters. The string can consist of a single word, a group of words or a multi-line paragraph. The String is defined by the user within a single quote (') or double quote (“).

What is in Perl regex?

Regular Expression (Regex or Regexp or RE) in Perl is a special text string for describing a search pattern within a given text. Regex in Perl is linked to the host language and is not the same as in PHP, Python, etc. Sometimes it is termed as “Perl 5 Compatible Regular Expressions“.

rindex $string, $substring, 0

searches for $substring in $string at position <=0 which is only possible if $substring is a prefix of $string. Example:

> rindex "abc", "a", 0 0 > rindex "abc", "b", 0 -1

How important is this, really? I did a number of benchmarks, and the index method averaged 0.68 microseconds per iteration; the regex method 1.14μs; the substr method 0.16μs. Even my worst-case scenarios (2250-char strings that were equal), index took 2.4μs, regex took 5.7μs, and substr took 0.5μs.

My advice is to write a library routine:

sub begins_with {     return substr($_[0], 0, length($_[1])) eq $_[1]; }

and focus your optimization efforts elsewhere.

UPDATE: Based on criticism of my "worst-case" scenario described above, I ran a new set of benchmarks with a 20,000-char randomly-generated string, comparing it against itself and against a string that differed only in the last byte.

For such long strings, the regex solution was by far the worst (a 20,000 character regex is hell): 105μs for the match success, 100μs for the match failure.

The index and substr solutions were still quite fast. index was 11.83μs / 11.86μs for success/failure, and substr was 4.09μs / 4.15μs. Moving the code to a separate function added about 0.222±0.05μs.

Benchmark code available at: http://codepaste.net/2k1y8e

I do not know the characteristics of @Stephane's data, but my advice stands.

Most efficient way to check if $string starts with $needle in perl

Tags:

performance

string-matching

perl

Stephane Chazelas

People also ask

2 Answers

Gregory Kalabin

Sue D. Nymme

Recent Activity

Donate For Us

Most efficient way to check if $string starts with $needle in perl

Tags:

performance

string-matching

perl

Stephane Chazelas

People also ask

2 Answers

Gregory Kalabin

Sue D. Nymme

Related questions

Recent Activity

Donate For Us