Given a vector addition:
NPNumber NPNumber::plus(const double o) const {
vector<double> c;
for (double a : values)
c.push_back(a + o);
return NPNumber(width, c);
}
Where NPNumber contains a vector of doubles (field values), when I only add a single integer, instead of another NPNumber, is there a performance benefit or penalty compared to converting that integer and using the function above?
i.e., is this faster/slower on any architecture:
NPNumber NPNumber::plus(const int i) const {
vector<double> c;
for (double a : values)
c.push_back(a + i);
return NPNumber(width, c);
}
It's strongly compiler depended and you should measure it in your code. A quick and simple observation in my machine (32-bit MinGW/gcc 4.9) shows the +
itself is equal for both cases, however the integral operation seems a little better.
Adding two double
:
! double d = 0.2;
fldl 0x409070
fstpl -0x10(%ebp)
! double y = 1.0;
fld1
fstpl -0x18(%ebp)
! double z = d + y;
fldl -0x10(%ebp)
faddl -0x18(%ebp)
fstpl -0x20(%ebp)
Adding two int
:
! double d = 0.2;
fldl 0x409070
fstpl -0x28(%ebp)
! int y = 1;
movl $0x1,-0x2c(%ebp)
! double z = d + y;
fildl -0x2c(%ebp)
faddl -0x28(%ebp)
fstpl -0x38(%ebp)
Both use faddl
to add, but compiler uses better instruction to load the integer before adding. So, there is no penalty to add an integer to a double (and it may be even better rather than adding two doubles).
In your application, profiling is the best way to find out that which one is better.
Another thing to consider is compiler optimizations.
Floating point units tend to have their own registers. These in some cases may even have greater precision than typical operands (for instance, 80-bit temporary reals;) however, see the comments as this can vary a lot.
I would expect it is cheaper to operate on values already loaded into the FPU, and the compiler should know this. As such, it may hoist the promotion of your constant value out of the loop and keep the value loaded in the FPU, in which case the difference would be negligible on large vectors.
In any event, I would hope that if the int
to double
conversion is expensive on a given platform, a respectable compiler would not perform it redundantly. As such, what I'd probably do is make it a template method so you can accept whatever type & precision the constant data naturally comes from; this permits the compiler to "do the right thing" for the particular platform in any given situation.
With that said, compilers do vary quite a bit in their optimization strategies and platforms vary in their features & performance characteristics, so if you're trying to squeeze out every last microsecond, you should do profiling for your platform(s) of interest.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With