Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is undefined behavior only an issue if you are deploying on several platforms?

Most of the conversations around undefined behavior (UB) talk about how there are some platforms that can do this, or some compilers do that.

What if you are only interested in one platform and only one compiler (same version) and you know you will be using them for years?

Nothing is changing but the code, and the UB is not implementation-defined.

Once the UB has manifested for that architecture and that compiler and you have tested, can't you assume that from then on whatever the compiler did with the UB the first time, it will do that every time?

Note: I know undefined behavior is very, very bad, but when I pointed out UB in code written by somebody in this situation, they asked this, and I didn't have anything better to say than, if you ever have to upgrade or port, all the UB will be very expensive to fix.

It seems there are different categories of Behavior:

  1. Defined - This is behavior documented to work by the standards
  2. Supported - This is behavior documented to be supported a.k.a implementation defined
  3. Extensions - This is a documented addition, support for low level bit operations like popcount, branch hints, fall into this category
  4. Constant - While not documented, these are behaviors that will likely be consistent on a given platform things like endianness, sizeof int while not portable are likely to not change
  5. Reasonable - generally safe and usually legacy, casting from unsigned to signed, using the low bit of a pointer as temp space
  6. Dangerous - reading uninitialized or unallocated memory, returning a temp variable, using memcopy on a non pod class

It would seem that Constant might be invariant within a patch version on one platform. The line between Reasonable and Dangerous seems to be moving more and more behavior towards Dangerous as compilers become more aggressive in their optimizations

like image 431
Glenn Teitelbaum Avatar asked Aug 28 '15 13:08

Glenn Teitelbaum


1 Answers

OS changes, innocuous system changes (different hardware version!), or compiler changes can all cause previously "working" UB to not work.

But it is worse than that.

Sometimes a change to an unrelated compilation unit, or far away code in the same compilation unit, can cause previously "working" UB to not work; as an example, two inline functions or methods with different definitions but the same signature. One is silently discarded during linking; and completely innocuous code changes can change which one is discarded.

The code that is working in one context can suddenly stop working in the same compiler, OS and hardware when you use it in a different context. An example of this is violating strong aliasing; the compiled code might work when called at spot A, but when inlined (possibly at link-time!) the code can change meaning.

Your code, if part of a larger project, could conditionally call some 3rd party code (say, a shell extension that previews an image type in a file open dialog) that changes the state of some flags (floating point precision, locale, integer overflow flags, division by zero behavior, etc). Your code, which worked fine before, now exhibits completely different behavior.

Next, many kinds of undefined behavior are inherently non-deterministic. Accessing the contents of a pointer after it is freed (even writing to it) might be safe 99/100, but 1/100 the page was swapped out, or something else was written there before you got to it. Now you have memory corruption. It passes all your tests, but you lacked complete knowledge of what can go wrong.

By using undefined behavior, you commit yourself to a complete understanding of the C++ standard, everything your compiler can do in that situation, and every way the runtime environment can react. You have to audit the produced assembly, not the C++ source, possibly for the entire program, every time you build it! You also commit everyone who reads that code, or who modifies that code, to that level of knowledge.

It is sometimes still worth it.

Fastest Possible Delegates uses UB and knowledge about calling conventions to be a really fast non-owning std::function-like type.

Impossibly Fast Delegates competes. It is faster in some situations, slower in others, and is compliant with the C++ standard.

Using the UB might be worth it, for the performance boost. It is rare that you gain something other than performance (speed or memory usage) from such UB hackery.

Another example I've seen is when we had to register a callback with a poor C API that just took a function pointer. We'd create a function (compiled without optimization), copy it to another page, modify a pointer constant within that function, then mark that page as executable, allowing us to secretly pass a pointer along with the function pointer to the callback.

An alternative implementation would be to have some fixed size set of functions (10? 100? 1000? 1 million?) all of which look up a std::function in a global array and invoke it. This would put a limit on how many such callbacks we install at any one time, but practically was sufficient.

like image 88
Yakk - Adam Nevraumont Avatar answered Sep 18 '22 19:09

Yakk - Adam Nevraumont