Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory barrier on single core ARM

There is a lot of information related to memory barriers. Most info refers to multicore or multi processor architectures. Somewhere here on Stackoverflow is also stated that memory barriers are not required on single core processors.

So far I can not find any clear explanation why it should not be required on single core CPU's. Suppose a load and a store are reordered in thread A and a context switch occurs between both instructions. In this case thread B might react not as expected. Why would a context switch on a single core behave differently compared to 2 threads on different cores ? (except any cache coherency issues)

For example some infor from the ARM website:

"It is architecturally defined that software must perform a Data Memory Barrier (DMB) operation: •between acquiring a resource, for example, through locking a mutex (MUTual EXclusion) or decrementing a semaphore, and making any access to that resource •before making a resource available, for example, through unlocking a mutex or incrementing a semaphore"

This sounds very clear, however in the provided example they refer explicitly to a multi core configuration.

like image 837
Waldorf Avatar asked Feb 02 '15 20:02

Waldorf


People also ask

How does memory barrier work?

The memory barrier instructions halt execution of the application code until a memory write of an instruction has finished executing. They are used to ensure that a critical section of code has been completed before continuing execution of the application code.

Why do we use memory barriers?

Memory barriers are used to provide control over the order of memory accesses. This is necessary sometimes because optimizations performed by the compiler and hardware can cause memory to be accessed in a different order than intended by the developer.

What is data synchronization barrier?

Data Synchronization Barrier (DSB) The DSB instruction is a special memory barrier, that synchronizes the execution stream with memory accesses. The DSB instruction takes the required shareability domain and required access types as arguments, see Shareability and access limitations on the data barrier operations.


Video Answer


1 Answers

Why would a context switch on a single core behave differently compared to 2 threads on different cores ? (except any cache coherency issues)

The threads on separate cores may act at exactly the same time. You still have issues on a single core.

Somewhere here on Stackoverflow is also stated that memory barriers are not required on single core processors.

This information maybe taken out of context (or not provide enough context).


Wikipedia's Memory barrier and Memory ordering pages have sections Out-of-order execution versus compiler reordering optimizations and Compile time/Run time ordering. There are many places in a pipeline where the ordering of memory may matter. In some cases, this may be taken care of by the compiler, by the OS, or by our own code.

Compiler memory barriers apply to a single CPU. They are especially useful with hardware where the ordering and timing of writes and reads matter.

Linux defines some more types of memory barriers,

  1. Write/Store.
  2. Data dependency.
  3. Read/Load.
  4. General memory barriers.

Mainly these map fairly well to DMB (DSB and IMB are more for code modification).

The more advances ARM CPUs have multiple load/store units. In theory some non-preemptive threading switch Note1 (especially with aliased memory) could cause some issue with a multi-threaded single CPU application. However, it would be fairly hard to construct this case.

For the most part, good memory ordering is handled by the CPU by scheduling instructions. A common case where it does matter with a single CPU is for system level programmers altering CP15 registers. For instance, an ISB should be issued when turning on the MMU. The same may be true for certain hardware/device registers. Finally, a program loader will need barriers as well as cache operations, even on single CPU systems.

UnixSmurf wrote these blogs on memory access ordering,

  • Intro
  • Barriers and the Linux kernel
  • Memory access and the ARM architecture

The topic is complex and you have to be specific about the types of barriers you are discussing.

Note1: I say non preemptive as if an interrupt occurs, the single CPU will probably ensure that all outstanding memory requests are complete. With a non preemptive switch, you do something like longjmp to change threads. In theory, you could change contexts before all writes had completed. The system would only need a DMB in the yield() to avoid it.

like image 191
artless noise Avatar answered Oct 21 '22 09:10

artless noise