Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does volatile qualifier cancel caching for this memory?

In this article: http://www.drdobbs.com/parallel/volatile-vs-volatile/212701484?pgno=2 says, that we can't do any optimization for volatile, even such as (where: volatile int& v = *(address);):

v = 1;                // C: write to v
local = v;            // D: read from v

can't be optimized to this:

v = 1;                // C: write to v
local = 1;            // D: read from v  // but it can be done for std::atomic<>

It is can't be done, because between 1st and 2nd lines may v value be changed by hardware device (not CPU where can't work cache coherence: network adapter, GPU, FPGA, etc...) (sequentila/concurrency), which mapped to this memory location. But it is make sense only if v can't be cached in CPU-cache L1/2/3, because for usual (non-volatile) variable between 1st and 2nd line too small time and is likely to trigger cached.

Does volatile qualifier guarantees no caching for this memory location?

ANSWER:

  1. No, volatile doesn't guarantee no caching for this memory location, and there aren't anything about this in C/C++ Standards or compiler manual.
  2. Using memory mapped region, when memory mapped from device memory to CPU-memory is already marked as WC (write combining) instead of WB, that cancels the caching. And need not to do cache-flushing.
  3. An opposite, if CPU-memory mapped to the device memory, then incidentally, the controller PCIE, located on crystal of CPU, is snooping for data which going through DMA from this device, and updates(invalidate) CPU-cache L3. In this case, if the executable code on the device using the volatile tries to perform the same two lines, it also cancels the cache memory of the device (e.g. in the cache GPU-L2). And need not to do GPU-cache-flushing and need not to do CPU-cache-flushing. Also for CPU might need to use std::atomic_thread_fence(std::memory_order_seq_cst); if L3-cache(LLC) coherency with DMA over PCIE, but L1/L2 is not. And for nVidia CUDA we can use: void __threadfence_system();
  4. We need to flushing DMA-controllers-cache, when sending unaligned data: (WDK: KeFlushIoBuffers(), FlushAdapterBuffers())
  5. Also, we can mark any memory region as uncached as WC-marked by yourself via the MTRR registers.
like image 506
Alex Avatar asked Aug 31 '13 17:08

Alex


People also ask

What is the use of volatile qualifier?

The volatile qualifier declares a data object that can have its value changed in ways outside the control or detection of the compiler (such as a variable updated by the system clock or by another program).

Will we need volatile if we had no caches?

Conclusion: According to both Intel and AMD, cache consistency is managed by the hardware and thus volatile has nothing to do with caches. And the "volatiles are forced to live in main memory" is a myth. It does, however, probably indirectly cause additional cache invalidations, since STORE's are used more frequently.

Is RAM cache volatile?

Volatile and Non-Volatile Memory are both types of computer memory. Volatile Memory is used to store computer programs and data that CPU needs in real time and is erased once computer is switched off. RAM and Cache memory are volatile memory.

How a volatile keyword prevents optimization?

The volatile keyword is intended to prevent the compiler from applying any optimizations on objects that can change in ways that cannot be determined by the compiler. Objects declared as volatile are omitted from optimization because their values can be changed by code outside the scope of current code at any time.


1 Answers

volatile ensures that the variable won't be "cached" in CPU register. CPU cache is transparent to the programmer and if another CPU writes to the memory mapped by another CPU's cache, the second CPU's cache gets invalidated, therefore it will reload the value from the memory again during the next access.

Something about Cache coherence

As for the external memory writes (via DMA or another CPU-independent channel), you might need to flush the cache manually (see this SO question)


C Standard §6.7.3 7:

What constitutes an access to an object that has volatile-qualified type is implementation-defined.

like image 177
Erbureth Avatar answered Sep 19 '22 08:09

Erbureth