I'm looking for examples of code that triggers non-determinism in GCC or Clang's compilation process.
One prominent example is the usage of the __DATE__
macro.
GCC and Clang have a plethora of compiler flags to control the outcome of non-deterministic actions within the compiler eg. -frandom-seed
and -fno-guess-branch-probability
Are there any small examples that are affected by these flags?
To be more precise:
$ c++ main.cpp -o main && shasum main
aabbccddee
$ c++ main.cpp -o main && shasum main
eeddccbbaa
I'm looking for macro-free code examples where multiple runs of the compiler lead to different outputs, but can be fixed by e.g. -frandom-seed
EDIT:
related: from the gcc docs:
-fno-guess-branch-probability:
Sometimes gcc will opt to use a randomized model to guess branch probabilities,
when none are available from either profiling feedback (-fprofile-arcs)
or __builtin_expect.
This means that different runs of the compiler on the same program
may produce different object code.
The default is -fguess-branch-probability at levels -O, -O2, -O3, -Os.
Referring to the inability to objectively predict an outcome or result of a process due to lack of knowledge of a cause and effect relationship or the inability to know initial conditions.
A C++ compiler exhibits non-deterministic behavior if, for the same input program, the object code generated by the compiler differs from run to run.
Nondeterminism means that the path of execution isn't fully determined by the specification of the computation, so the same input can produce different outcomes, while deterministic execution is guaranteed to be the same, given the same input. Related terms to "nondeterministic" are "probabilistic" and "stochastic".
Although the definitive answer is "it depends", it is reasonable to expect that most compilers will be deterministic most of the time, and that the binaries produced should be identical. Indeed, some version control systems depend on this.
While old, this question is interesting for reproducible builds.
As you've stated, there are multiple source of non-determinism while compiling some C/C++ source.
The preprocessor usually implements some numerous super macro which are changing between runs. There's the obvious __DATE__
and __TIME__
but also the non obvious __cplusplus
or __STD_C_VERSION__
or __GNUC_PATCHLEVEL__
which can changes when the OS updates.
There's also the __FILE__
that will contain the path of the building environment (different from machine to machine).
Please notice that for the former macro, GCC observes the environment variable SOURCE_DATE_EPOCH
to overwrite the date and time macro. Other compilers might have some other behavior.
The compiler might have different optimization strategies based on non-deterministic approach. You've cited one in GCC, but other might exists.
For MSVC, you might be interested in the /BREPRO
compiler flag.
You'll have to RTFM for your compiler to know more.
On some architecture, the linked object and/or library will contain a timestamp. MacOS is one of them. So for the same set of .o files, you'll get a different resulting executable.
Also, if you use Link Time Optimization, many compiler will create different versions of the .o files named randomly. Again for GCC, you'll use -frandom-seed=31415
to "fix" this randomness, but YMMV.
Sometimes repositories contain additional operation that are performed outside of the compilation stage. Like generating header files based on some configuration flags (or other steps). In that case, this per-project's specific operations might not be deterministic either.
For a good overview of the deterministic builds, please refer to this post
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With