Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl performance hit between two common regex methods for string trimming?

Tags:

regex

perl

So I'm working on a Perl script that does a large amount of processing (nothing too complicated, but lots of it) and decided to do a little benchmark to compare two common methods of trimming strings.

The first method is a quick one-liner:

$word =~ s/^\s+|\s+$//g;

The second method is a little longer, but does the same thing:

$word =~ s/^\s+//;
$word =~ s/\s+$//;

For my benchmarks, I had the script read from a file with 40 million lines, trimming each (does nothing other than that). The average line length is under 20 bytes.

The first method took on average 87 seconds to complete.
The second method took on average 27 seconds to complete.
Doing no processing (just read line, continue) takes an average 16 seconds.

The first method (first pass) will match either all the leading or trailing whitespace, then remove it, then match and remove the leading/trailing whitespace on the other side.
The second method matches and removes all leading whitespace, then matches and removes all trailing whitespace.

Maybe I'm in the wrong here, but why would the second method be over 3x faster than the first?

like image 933
Mr. Llama Avatar asked Dec 03 '22 00:12

Mr. Llama


1 Answers

The regex engine is having to do more work in the first case namely in backtracking to evaluate alternatives. You can see the difference in the code involved:

echo " hello " |perl -Mre=debug -ple 's/^\s+|\s+$//g'
echo " hello " |perl -Mre=debug -ple 's/^\s+//;s/\s+$//'
like image 50
JRFerguson Avatar answered Dec 18 '22 07:12

JRFerguson