Why is atomic.StoreUint32 preferred over a normal assignment in sync.Once?

Question

While reading the source codes of Go, I have a question about the code in src/sync/once.go:

func (o *Once) Do(f func()) {
    // Note: Here is an incorrect implementation of Do:
    //
    //  if atomic.CompareAndSwapUint32(&o.done, 0, 1) {
    //      f()
    //  }
    //
    // Do guarantees that when it returns, f has finished.
    // This implementation would not implement that guarantee:
    // given two simultaneous calls, the winner of the cas would
    // call f, and the second would return immediately, without
    // waiting for the first's call to f to complete.
    // This is why the slow path falls back to a mutex, and why
    // the atomic.StoreUint32 must be delayed until after f returns.

    if atomic.LoadUint32(&o.done) == 0 {
        // Outlined slow-path to allow inlining of the fast-path.
        o.doSlow(f)
    }
}

func (o *Once) doSlow(f func()) {
    o.m.Lock()
    defer o.m.Unlock()
    if o.done == 0 {
        defer atomic.StoreUint32(&o.done, 1)
        f()
    }
}

Why is atomic.StoreUint32 used, rather than, say o.done = 1? Are these not equivalent? What are the differences?

Must we use the atomic operation (atomic.StoreUint32) to make sure that other goroutines can observe the effect of f() before o.done is set to 1 on a machine with weak memory model?

JimB · Accepted Answer

Remember, unless you are writing the assembly by hand, you are not programming to your machine's memory model, you are programming to Go's memory model. This means that even if primitive assignments are atomic with your architecture, Go requires the use of the atomic package to ensure correctness across alls supported architectures.

Access to the done flag outside of the mutex only needs to be safe, not strictly ordered, so atomic operations can be used instead of always obtaining a lock with a mutex. This is an optimization to make the fast path as efficient as possible, allowing sync.Once to be used in hot paths.

The mutex used for doSlow is for mutual exclusion within that function alone, to ensure that only one caller ever makes it to f() before the done flag is set. The flag is written using atomic.StoreUint32, because it may happen concurrently with atomic.LoadUint32 outside of the critical section protected by the mutex.

Reading the done field concurrently with writes, even atomic writes, is a data race. Just because the field is read atomically, does not mean you can use normal assignment to write it, hence the flag is checked first with atomic.LoadUint32 and written with atomic.StoreUint32

The direct read of done within doSlow is safe, because it is protected from concurrent writes by the mutex. Reading the value concurrently with atomic.LoadUint32 is safe because both are read operations.

Why is atomic.StoreUint32 preferred over a normal assignment in sync.Once?

Tags:

memory-model

atomic

concurrency

go

kingwah001

1 Answers

JimB

Recent Activity

Donate For Us

Why is atomic.StoreUint32 preferred over a normal assignment in sync.Once?

Tags:

memory-model

atomic

concurrency

go

kingwah001

1 Answers

JimB

Related questions

Recent Activity

Donate For Us