The Perl FAQ entry How do I strip blank space from the beginning/end of a string? states that using
s/^\s+|\s+$//g;
is slower than doing it in two steps:
s/^\s+//;
s/\s+$//;
Why is this combined statement noticeably slower than the separate ones (for any input string)?
The Perl regex runtime runs much quicker when working with 'fixed' or 'anchored' substrings rather than 'floated' substrings. A substring is fixed when you can lock it to a certain place in the source string. Both '^' and '$' provide that anchoring. However, when you use alternation '|', the compiler doesn't recognize the choices as fixed, so it uses less optimized code to scan the whole string. And at the end of the process, looking for fixed strings twice is much, much faster than looking for a floating string once. On a related note, reading perl's regcomp.c will make you go blind.
Update: Here's some additional details. You can run perl with the '-Dr' flag if you've compiled it with debugging support and it'll dump out regex compilation data. Here's what you get:
~# debugperl -Dr -e 's/^\s+//g'
Compiling REx `^\s+'
size 4 Got 36 bytes for offset annotations.
first at 2
synthetic stclass "ANYOF[\11\12\14\15 {unicode_all}]".
1: BOL(2)
2: PLUS(4)
3: SPACE(0)
4: END(0)
stclass "ANYOF[\11\12\14\15 {unicode_all}]" anchored(BOL) minlen 1
# debugperl -Dr -e 's/^\s+|\s+$//g'
Compiling REx `^\s+|\s+$'
size 9 Got 76 bytes for offset annotations.
1: BRANCH(5)
2: BOL(3)
3: PLUS(9)
4: SPACE(0)
5: BRANCH(9)
6: PLUS(8)
7: SPACE(0)
8: EOL(9)
9: END(0)
minlen 1
Note the word 'anchored' in the first dump.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With