Several times I've read that unpack()
is faster than substr()
, especially as the number of substrings increases. However, this benchmark suggests otherwise. Is my benchmark flawed, or is the alleged performance advantage of unpack()
a holdover from older versions of Perl?
use strict;
use warnings;
use Benchmark;
my ($data, $format_string, $n_substrings);
my %methods = (
unpack => sub { return unpack $format_string, $data },
substr => sub { return map {substr $data, $_, 1} 0 .. $n_substrings - 1 },
);
for my $exp (1 .. 5){
$n_substrings = 10 ** $exp;
print $n_substrings, "\n";
$format_string = 'a1' x $n_substrings;
$data = 9 x $n_substrings;
Benchmark::cmpthese -2, \%methods;
}
Output (on Windows):
10
Rate unpack substr
unpack 131588/s -- -52%
substr 276802/s 110% --
100
Rate unpack substr
unpack 13660/s -- -57%
substr 31636/s 132% --
1000
Rate unpack substr
unpack 1027/s -- -68%
substr 3166/s 208% --
10000
Rate unpack substr
unpack 84.4/s -- -74%
substr 322/s 281% --
100000
Rate unpack substr
unpack 5.46/s -- -82%
substr 30.1/s 452% --
As pointed out in some answers, unpack()
does poorly on Windows. Here's the output on a solaris machine -- not nearly so decisive, but substr()
still wins the foot race:
10
Rate unpack substr
unpack 202274/s -- -4%
substr 210818/s 4% --
100
Rate unpack substr
unpack 22015/s -- -9%
substr 24322/s 10% --
1000
Rate unpack substr
unpack 2259/s -- -9%
substr 2481/s 10% --
10000
Rate unpack substr
unpack 225/s -- -9%
substr 247/s 9% --
100000
Rate unpack substr
unpack 22.0/s -- -10%
substr 24.4/s 11% --
As a matter of fact, your benchmark is flawed, in a really, really interesting way, but what it boils down to is that what you are really comparing is the relative efficiency with which unpack vs. map can throw away a list, because Benchmark::cmpthese() is executing the functions in void context.
The reason your substr comes out on top is this line of code in pp_ctl.c pp_mapwhile():
if (items && gimme != G_VOID) {
i.e. perl's map magically skips a bunch of work (namely allocating storage for the results of the map) if it knows it is being called in void context!
(My hunch on the windows vs. other seen above is that windows-based perl memory allocation is awful, so skipping the allocation is a bigger savings there -- just a hunch, though, I don't have a windows box to play with. But the actual unpack implementation is straight-up C code, and shouldn't differ substantially from windows to other.)
I have three different solutions for working around this issue and generating a more fair comparison:
Here's my version of %methods, with all three versions:
my %methods = (
unpack_assign => sub { my @foo = unpack $format_string, $data; return },
unpack_loop => sub { for my $foo (unpack $format_string, $data) { } },
unpack_return_ref => sub { return [ unpack $format_string, $data ] },
unpack_return_array => sub { return unpack $format_string, $data },
substr_assign => sub { my @foo = map {substr $data, $_, 1} 0 .. ($n_substrings - 1) },
substr_loop => sub { for my $foo ( map {substr $data, $_, 1} 0 .. ($n_substrings - 1)) { } },
substr_return_ref => sub { return [ map {substr $data, $_, 1} 0 .. ($n_substrings - 1) ] },
substr_return_array => sub { return map { substr $data, $_, 1} 0 .. ($n_substrings - 1) },
);
And my results:
$ perl -v
This is perl, v5.10.0 built for x86_64-linux-gnu-thread-multi
$ perl foo.pl
10
Rate substr_assign substr_return_ref substr_loop unpack_assign unpack_return_ref unpack_loop unpack_return_array substr_return_array
substr_assign 101915/s -- -20% -21% -28% -51% -51% -65% -69%
substr_return_ref 127224/s 25% -- -1% -10% -39% -39% -57% -62%
substr_loop 128484/s 26% 1% -- -9% -38% -39% -56% -61%
unpack_assign 141499/s 39% 11% 10% -- -32% -32% -52% -57%
unpack_return_ref 207144/s 103% 63% 61% 46% -- -1% -29% -37%
unpack_loop 209520/s 106% 65% 63% 48% 1% -- -28% -37%
unpack_return_array 292713/s 187% 130% 128% 107% 41% 40% -- -12%
substr_return_array 330827/s 225% 160% 157% 134% 60% 58% 13% --
100
Rate substr_assign substr_loop substr_return_ref unpack_assign unpack_return_ref unpack_loop unpack_return_array substr_return_array
substr_assign 11818/s -- -25% -25% -26% -53% -55% -63% -70%
substr_loop 15677/s 33% -- -0% -2% -38% -40% -51% -60%
substr_return_ref 15752/s 33% 0% -- -2% -37% -40% -51% -60%
unpack_assign 16061/s 36% 2% 2% -- -36% -39% -50% -59%
unpack_return_ref 25121/s 113% 60% 59% 56% -- -4% -22% -35%
unpack_loop 26188/s 122% 67% 66% 63% 4% -- -19% -33%
unpack_return_array 32310/s 173% 106% 105% 101% 29% 23% -- -17%
substr_return_array 38910/s 229% 148% 147% 142% 55% 49% 20% --
1000
Rate substr_assign substr_return_ref substr_loop unpack_assign unpack_return_ref unpack_loop unpack_return_array substr_return_array
substr_assign 1309/s -- -23% -25% -28% -52% -54% -62% -67%
substr_return_ref 1709/s 31% -- -3% -6% -38% -41% -51% -57%
substr_loop 1756/s 34% 3% -- -3% -36% -39% -49% -56%
unpack_assign 1815/s 39% 6% 3% -- -34% -37% -48% -55%
unpack_return_ref 2738/s 109% 60% 56% 51% -- -5% -21% -32%
unpack_loop 2873/s 120% 68% 64% 58% 5% -- -17% -28%
unpack_return_array 3470/s 165% 103% 98% 91% 27% 21% -- -14%
substr_return_array 4015/s 207% 135% 129% 121% 47% 40% 16% --
10000
Rate substr_assign substr_return_ref substr_loop unpack_assign unpack_return_ref unpack_loop unpack_return_array substr_return_array
substr_assign 131/s -- -23% -27% -28% -52% -55% -63% -67%
substr_return_ref 171/s 30% -- -5% -6% -38% -42% -52% -57%
substr_loop 179/s 37% 5% -- -1% -35% -39% -50% -55%
unpack_assign 181/s 38% 6% 1% -- -34% -38% -49% -55%
unpack_return_ref 274/s 109% 60% 53% 51% -- -6% -23% -32%
unpack_loop 293/s 123% 71% 63% 62% 7% -- -18% -27%
unpack_return_array 356/s 171% 108% 98% 96% 30% 21% -- -11%
substr_return_array 400/s 205% 134% 123% 121% 46% 37% 13% --
100000
Rate substr_assign substr_return_ref substr_loop unpack_assign unpack_return_ref unpack_loop unpack_return_array substr_return_array
substr_assign 13.0/s -- -22% -26% -29% -51% -55% -63% -67%
substr_return_ref 16.7/s 29% -- -5% -8% -37% -43% -52% -58%
substr_loop 17.6/s 36% 5% -- -3% -33% -40% -50% -56%
unpack_assign 18.2/s 40% 9% 3% -- -31% -37% -48% -54%
unpack_return_ref 26.4/s 103% 58% 50% 45% -- -9% -25% -34%
unpack_loop 29.1/s 124% 74% 65% 60% 10% -- -17% -27%
unpack_return_array 35.1/s 170% 110% 99% 93% 33% 20% -- -12%
substr_return_array 39.7/s 206% 137% 125% 118% 50% 36% 13% --
So back to the original question: "is unpack() ever faster than substr()?" Answer: always, for this type of application -- unless you don't care about the return values ;)
The test is not flawed, but it is skewed. substr is better if all you need to do is extract a fairly simple substring from a string, but thats about it. For example, even this simple task is not easily done using substr:
$foo = '123foo456789bar89012';
my ($t1,$t2,$t3,$t4,$t5) = unpack("A3A3A6A3A5",$foo);
Thats where you should see a dramatic difference between substr and unpack.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With