Why is s/^\s+|\s+$//g; so much slower than two separate substitutions?

Question

The Perl FAQ entry How do I strip blank space from the beginning/end of a string? states that using

s/^\s+|\s+$//g;

is slower than doing it in two steps:

s/^\s+//;
s/\s+$//;

Why is this combined statement noticeably slower than the separate ones (for any input string)?

sidereal · Accepted Answer

The Perl regex runtime runs much quicker when working with 'fixed' or 'anchored' substrings rather than 'floated' substrings. A substring is fixed when you can lock it to a certain place in the source string. Both '^' and '$' provide that anchoring. However, when you use alternation '|', the compiler doesn't recognize the choices as fixed, so it uses less optimized code to scan the whole string. And at the end of the process, looking for fixed strings twice is much, much faster than looking for a floating string once. On a related note, reading perl's regcomp.c will make you go blind.

Update: Here's some additional details. You can run perl with the '-Dr' flag if you've compiled it with debugging support and it'll dump out regex compilation data. Here's what you get:

~# debugperl -Dr -e 's/^\s+//g'
Compiling REx `^\s+'
size 4 Got 36 bytes for offset annotations.
first at 2
synthetic stclass "ANYOF[\11\12\14\15 {unicode_all}]".
   1: BOL(2)
   2: PLUS(4)
   3:   SPACE(0)
   4: END(0)
stclass "ANYOF[\11\12\14\15 {unicode_all}]" anchored(BOL) minlen 1

# debugperl -Dr -e 's/^\s+|\s+$//g'
Compiling REx `^\s+|\s+$'
size 9 Got 76 bytes for offset annotations.

   1: BRANCH(5)
   2:   BOL(3)
   3:   PLUS(9)
   4:     SPACE(0)
   5: BRANCH(9)
   6:   PLUS(8)
   7:     SPACE(0)
   8:   EOL(9)
   9: END(0)
minlen 1

Note the word 'anchored' in the first dump.

Why is s/^\s+|\s+$//g; so much slower than two separate substitutions?

Tags:

regex

trim

perl

Eugene Yarmash

1 Answers

sidereal

Recent Activity

Donate For Us

Why is s/^\s+|\s+$//g; so much slower than two separate substitutions?

Tags:

regex

trim

perl

Eugene Yarmash

1 Answers

sidereal

Related questions

Recent Activity

Donate For Us