Since regular array accesses via the [] operator are unchecked, it's not fun to hit the headlines when your program has a remote code execution exploit or data leakage due to a buffer overflow.
Most standard array containers contains the at()
method which allows bounds checked access to the array elements. This makes out of bounds array accesses well defined (throws exception), instead of undefined behavior.
This basically eliminates buffer overflow arbitrary code execution exploits and there is also a clang-tidy check that warns that you should use at()
when the index is non-constant. So I changed it quite a few places.
Most managed languages have checked arrays and their compilers can eliminate the checks when they can.
I know C++ compilers can do awesome optimizations. The question is can C++ compilers do this to eliminate calls to at()
when they see it can't overflow?
Compiler optimization is generally implemented using a sequence of optimizing transformations, algorithms which take a program and transform it to produce a semantically equivalent output program that uses fewer resources or executes faster.
If you mean that the difference can not be observed, then yes, the compiler (and even the CPU itself) is free to reorder the operations.
Compilers are free to optimize code so long as they can guarantee the semantics of the code are not changed.
Here is a classic case that would be subject to bounds check elimination in managed languages: iterating up to the size.
#include <vector>
int test(std::vector<int> &v)
{
int sum = 0;
for (size_t i = 0; i < v.size(); i++)
sum += v.at(i);
return sum;
}
This is not as trivial to optimize as when both the index and the size are constants (which could be solved by constant propagation), it requires more advanced reasoning about the relationships between values.
As seen on Godbolt, GCC (9.2), Clang (9.0.0) and even MSVC (v19.22) can handle such code reasonably. GCC and Clang autovectorize. MSVC just generates a basic loop:
$LL4@test:
add eax, DWORD PTR [r9+rdx*4]
inc rdx
cmp rdx, r8
jb SHORT $LL4@test
Which is not that bad, but given that it does vectorize a similar loop that uses []
instead of .at()
, I have to conclude that: yes, there is a significant cost to using at
even in some basic cases where we might expect otherwise (especially given that there is no range check, so the auto-vectorization step got scared for seemingly no reason). If you choose to target only GCC and Clang then there is less of an issue. In more tricky cases, GCC and Clang can also be "sufficiently confused", for example when passing the indexes through a data structure (unlikely code, but the point is, range information can sometimes be lost).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With