Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

atomic fetch_add vs add performance

The code below demonstrates curiosities of multi-threaded programming. In particular the performance of std::memory_order_relaxed increment vs regular increment in a single thread. What I do not understand why fetch_add(relaxed) single-threaded is twice slower than a regular increment.

static void BM_IncrementCounterLocal(benchmark::State& state) {
  volatile std::atomic_int val2;

  while (state.KeepRunning()) {
    for (int i = 0; i < 10; ++i) {
      DoNotOptimize(val2.fetch_add(1, std::memory_order_relaxed));
    }
  }
}
BENCHMARK(BM_IncrementCounterLocal)->ThreadRange(1, 8);

static void BM_IncrementCounterLocalInt(benchmark::State& state) {
  volatile int val3 = 0;

  while (state.KeepRunning()) {
    for (int i = 0; i < 10; ++i) {
      DoNotOptimize(++val3);
    }
  }
}
BENCHMARK(BM_IncrementCounterLocalInt)->ThreadRange(1, 8);

Output:

      Benchmark                               Time(ns)    CPU(ns) Iterations
      ----------------------------------------------------------------------
      BM_IncrementCounterLocal/threads:1            59         60   11402509                                 
      BM_IncrementCounterLocal/threads:2            30         61   11284498                                 
      BM_IncrementCounterLocal/threads:4            19         62   11373100                                 
      BM_IncrementCounterLocal/threads:8            17         62   10491608

      BM_IncrementCounterLocalInt/threads:1         31         31   22592452                                 
      BM_IncrementCounterLocalInt/threads:2         15         31   22170842                                 
      BM_IncrementCounterLocalInt/threads:4          8         31   22214640                                 
      BM_IncrementCounterLocalInt/threads:8          9         31   21889704  
like image 887
Roman Avatar asked Jan 07 '16 16:01

Roman


1 Answers

With the volatile int, the compiler must ensure that it does not optimize away and/or reorder any reads/writes of the variable.

With the fetch_add, the CPU must take precautions that the read-modify-write operation is atomic.

These are two completely different requirements: The atomicity requirement means that the CPU has to communicate with other CPUs on your machine, ensuring that they don't read/write the given memory location between its own read and write. If the compiler compiles the fetch_add using a compare-and-swap instruction, it will actually emit a short loop to catch the case that some other CPU modified the value in between.

For the volatile int no such communication is necessary. On the contrary, volatile requires that the compiler does not invent any reads: volatile was designed for single thread communication with hardware registers, where the mere act of reading the value may have side effects.

like image 68
cmaster - reinstate monica Avatar answered Oct 27 '22 17:10

cmaster - reinstate monica