I took too long to use warnings; and strict; in Perl, but now I did, I see the advantages.
One of the things I'm still not sure about is when to define a temporary variable. This may seem like a trivial thing, but I run a lot of Monte Carlo simulations where losing a bit of time adds up over 10000+ iterations. I've been lazy about using strict/warnings on quicker simulations, but they've gotten more complex, so I really need to.
So (cutting out code to calculate stuff) I am wondering if
sub doStuff
{
my $temp;
for my $x (1..50)
{
$temp = $x**2;
}
for my $x (1..50)
{
$temp = $x**3;
}
}
Or
sub doStuff
{
for my $x (1..50)
{
my $temp = $x**2;
}
for my $x (1..50)
{
my $temp = $x**3;
}
}
Is less/more efficient, or if one violates some Perl coding I didn't know yet.
The efficiency between these two is close enough, and it is dwarfed by any realistic processing. So I'd go by code – if the $tmp
is indeed temporary and unneeded after the loop then it is better to keep it inside (scoped), for all the other reasons.
Since this is about optimization I'd like to digress. Such micro-issues may have an effect. However, where you really gain is first at the level of algorithms, and then by choosing data structures and techniques suitably. The low-level tweaks are the very last thing to think about, and there are often language features and libraries that render them irrelevant. That said, one should know one's tool and not waste time around.
Also, there is often a trade-off between the code clarity and efficiency. If it comes to that I suggest to code for correctness and clarity. Then profile and optimize if needed, cautiously and gradually, and with a lot of testing in between.
Here is a comparison, as an example of basic use of the core module Benchmark. I throw in an additional operation and add other cases where there is no temporary.
use warnings 'all';
use strict;
use Benchmark qw(cmpthese);
my $x;
sub tmp_in {
for (1..10_000) {
my $tmp = 2 * $_;
$x = $tmp + $_;
}
return $x;
}
sub tmp_out {
my $tmp;
for (1..10_000) {
$tmp = 2 * $_;
$x = $tmp + $_;
}
return $x;
}
sub no_tmp {
for (1..10_000) { $x = 2 * $_ + $_ }
return $x;
}
sub base {
for (1..10_000) { $x += $_ }
return $x;
}
sub calc {
for (1..10_000) { $x += sin sqrt(rand()) }
return $x;
}
cmpthese(-10, {
tmp_in => sub { tmp_in },
tmp_out => sub { tmp_out },
no_tmp => sub { no_tmp },
base => sub { base },
calc => sub { calc },
});
Output (on v5.16)
Rate calc tmp_in tmp_out no_tmp base calc 623/s -- -11% -26% -44% -59% tmp_in 698/s 12% -- -17% -37% -54% tmp_out 838/s 34% 20% -- -25% -44% no_tmp 1117/s 79% 60% 33% -- -26% base 1510/s 142% 116% 80% 35% --
So they differ, and apparently a declaration in a loop costs. But tmp
versions are together in the list. Also, this is often just overhead so it is greatly exaggerated. And there are other aspects – no_tmp
runs in one statement, for example. These things may matter only if your processing is mostly iterations. Just generating a (high quality) pseudo-random number is expensive.
This may also differ (wildly) across different hardware and software versions. My results with v5.10 on a better machine are a bit different. Replace the sample 'calculations' with your processing, and run on the actual hardware, for a relevant measure of whether it matters at all.
Personally I would keep the temporary variable in the for loop. Just because that is where it is used. The other way, at some point down the line it will come back to bite you (or the person who has to pick up your code) with an unexpected value.
Also premature optimization is an anti-pattern
Optimization can reduce readability and add code that is used only to improve the performance. This may complicate programs or systems, making them harder to maintain and debug. As a result, optimization or performance tuning is often performed at the end of the development stage.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With