Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the definition of "volatile" this volatile, or is GCC having some standard compliancy problems?

Tags:

c++

c

gcc

standards

I need a function that (like SecureZeroMemory from the WinAPI) always zeros memory and doesn't get optimized away, even if the compiler thinks the memory is never going to accessed again after that. Seems like a perfect candidate for volatile. But I'm having some problems actually getting this to work with GCC. Here is an example function:

void volatileZeroMemory(volatile void* ptr, unsigned long long size) {     volatile unsigned char* bytePtr = (volatile unsigned char*)ptr;      while (size--)     {         *bytePtr++ = 0;     } } 

Simple enough. But the code that GCC actually generates if you call it varies wildly with the compiler version and the amount of bytes you're actually trying to zero. https://godbolt.org/g/cMaQm2

  • GCC 4.4.7 and 4.5.3 never ignore the volatile.
  • GCC 4.6.4 and 4.7.3 ignore volatile for array sizes 1, 2, and 4.
  • GCC 4.8.1 until 4.9.2 ignore volatile for array sizes 1 and 2.
  • GCC 5.1 until 5.3 ignore volatile for array sizes 1, 2, 4, 8.
  • GCC 6.1 just ignores it for any array size (bonus points for consistency).

Any other compiler I have tested (clang, icc, vc) generates the stores one would expect, with any compiler version and any array size. So at this point I'm wondering, is this a (pretty old and severe?) GCC compiler bug, or is the definition of volatile in the standard that imprecise that this is actually conforming behavior, making it essentially impossible to write a portable "SecureZeroMemory" function?

Edit: Some interesting observations.

#include <cstddef> #include <cstdint> #include <cstring> #include <atomic>  void callMeMaybe(char* buf);  void volatileZeroMemory(volatile void* ptr, std::size_t size) {     for (auto bytePtr = static_cast<volatile std::uint8_t*>(ptr); size-- > 0; )     {         *bytePtr++ = 0;     }      //std::atomic_thread_fence(std::memory_order_release); }  std::size_t foo() {     char arr[8];     callMeMaybe(arr);     volatileZeroMemory(arr, sizeof arr);     return sizeof arr; } 

The possible write from callMeMaybe() will make all GCC versions except 6.1 generate the expected stores. Commenting in the memory fence will also make GCC 6.1 generate the stores, although only in combination with the possible write from callMeMaybe().

Someone has also suggested to flush the caches. Microsoft does not try to flush the cache at all in "SecureZeroMemory". The cache is likely going to be invalidated pretty fast anyway, so this is probably not be a big deal. Also, if another program was trying to probe the data, or if it was going to be written to the page file, it would always be the zeroed version.

There are also some concerns about GCC 6.1 using memset() in the standalone function. The GCC 6.1 compiler on godbolt might a broken build, as GCC 6.1 seems to generate a normal loop (like 5.3 does on godbolt) for the standalone function for some people. (Read comments of zwol's answer.)

like image 893
cooky451 Avatar asked Jul 06 '16 18:07

cooky451


People also ask

Is volatile in standard C?

A volatile keyword in C is nothing but a qualifier that is used by the programmer when they declare a variable in source code. It is used to inform the compiler that the variable value can be changed any time without any task given by the source code. Volatile is usually applied to a variable when we are declaring it.

How does volatile affect code optimization by compiler?

Effect of the volatile keyword on compiler optimizationIf you do not use the volatile keyword where it is needed, then the compiler might optimize accesses to the variable and generate unintended code or remove intended functionality.

What is volatile in C++ with example?

Volatile Keyword in C/C++ Volatile is a qualifier that is applied to a variable when it is declared. It tells the compiler that the value of the variable may change any time. The implications of this are quite serious.

What is volatile global variable?

Volatile is a very common keyword used in Embedded Software development, to inform the compiler that the value of the variable may be changed in other Context or may be changed by the hardware. So it always preventing the compiler from optimizing global shared variables but did anybody think how the compiler did that.


2 Answers

GCC's behavior may be conforming, and even if it isn't, you should not rely on volatile to do what you want in cases like these. The C committee designed volatile for memory-mapped hardware registers and for variables modified during abnormal control flow (e.g. signal handlers and setjmp). Those are the only things it is reliable for. It is not safe to use as a general "don't optimize this out" annotation.

In particular, the standard is unclear on a key point. (I've converted your code to C; there shouldn't be any divergence between C and C++ here. I've also manually done the inlining that would happen before the questionable optimization, to show what the compiler "sees" at that point.)

extern void use_arr(void *, size_t); void foo(void) {     char arr[8];     use_arr(arr, sizeof arr);      for (volatile char *p = (volatile char *)arr;          p < (volatile char *)(arr + 8);          p++)       *p = 0; } 

The memory-clearing loop accesses arr through a volatile-qualified lvalue, but arr itself is not declared volatile. It is, therefore, at least arguably allowed for the C compiler to infer that the stores made by the loop are "dead", and delete the loop altogether. There's text in the C Rationale that implies that the committee meant to require those stores to be preserved, but the standard itself does not actually make that requirement, as I read it.

For more discussion of what the standard does or does not require, see Why is a volatile local variable optimised differently from a volatile argument, and why does the optimiser generate a no-op loop from the latter?, Does accessing a declared non-volatile object through a volatile reference/pointer confer volatile rules upon said accesses?, and GCC bug 71793.

For more on what the committee thought volatile was for, search the C99 Rationale for the word "volatile". John Regehr's paper "Volatiles are Miscompiled" illustrates in detail how programmer expectations for volatile may not be satisfied by production compilers. The LLVM team's series of essays "What Every C Programmer Should Know About Undefined Behavior" does not touch specifically on volatile but will help you understand how and why modern C compilers are not "portable assemblers".


To the practical question of how to implement a function that does what you wanted volatileZeroMemory to do: Regardless of what the standard requires or was meant to require, it would be wisest to assume that you can't use volatile for this. There is an alternative that can be relied on to work, because it would break far too much other stuff if it didn't work:

extern void memory_optimization_fence(void *ptr, size_t size); inline void explicit_bzero(void *ptr, size_t size) {    memset(ptr, 0, size);    memory_optimization_fence(ptr, size); }  /* in a separate source file */ void memory_optimization_fence(void *unused1, size_t unused2) {} 

However, you must make absolutely sure that memory_optimization_fence is not inlined under any circumstances. It must be in its own source file and it must not be subjected to link-time optimization.

There are other options, relying on compiler extensions, that may be usable under some circumstances and can generate tighter code (one of them appeared in a previous edition of this answer), but none are universal.

(I recommend calling the function explicit_bzero, because it is available under that name in more than one C library. There are at least four other contenders for the name, but each has been adopted only by a single C library.)

You should also know that, even if you can get this to work, it may not be enough. In particular, consider

struct aes_expanded_key { __uint128_t rndk[16]; };  void encrypt(const char *key, const char *iv,              const char *in, char *out, size_t size) {     aes_expanded_key ek;     expand_key(key, ek);     encrypt_with_ek(ek, iv, in, out, size);     explicit_bzero(&ek, sizeof ek); } 

Assuming hardware with AES acceleration instructions, if expand_key and encrypt_with_ek are inline, the compiler may be able to keep ek entirely in the vector register file -- until the call to explicit_bzero, which forces it to copy the sensitive data onto the stack just to erase it, and, worse, doesn't do a darn thing about the keys that are still sitting in the vector registers!

like image 123
zwol Avatar answered Sep 19 '22 00:09

zwol


I need a function that (like SecureZeroMemory from the WinAPI) always zeros memory and doesn't get optimized away,

This is what the standard function memset_s is for.


As to whether this behavior with volatile is conforming or not, that's a bit hard to say, and volatile has been said to have long been plagued with bugs.

One issue is that the specs say that "Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine." But that only refers to 'volatile objects', not accessing a non-volatile object via a pointer that has had volatile added. So apparently if a compiler can tell that you're not really accessing a volatile object then it's not required to treat the object as volatile after all.

like image 34
bames53 Avatar answered Sep 19 '22 00:09

bames53