Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to multiply two values and store the result atomically?

Say I have the following global variables in my code:

std::atomic<uint32_t> x(...);
std::atomic<uint32_t> y(...);
std::atomic<uint32_t> z(...);

My task is to multiply x and y and then store the result in z:

z = x * y

I'm aware that the naive approach of calling store() and load() on each object is completely wrong:

z.store(x.load() * y.load()); // wrong

This way I'm performing three separate atomic instructions: another thread might slip through and change one of the values in the meantime.

I could opt for a compare-and-swap (CAS) loop, but it would guarantee atomicity only for swapping the old value of z with the new one (x*y): I'm still not sure how to perform the whole operation in a single, atomic step.

I'm also aware that wrapping x, y and z inside a struct and make it atomic is not feasible here, as the struct doesn't fit inside a single 64-bit register. The compiler would employ locks under the hood (please correct me if I'm wrong here).

Is this problem solvable only with a mutex?

like image 730
Ignorant Avatar asked Jul 07 '19 10:07

Ignorant


1 Answers

I'm still not sure how to perform the whole operation in a single, atomic step.

It will only be possible to do so if you architecture supports something like "32-bit atomic multiplication" (and you would have to do it outside the C++ standard's facilities) or an atomic that is wide enough to perform a RMW operation on 64-bits.

I'm also aware that wrapping x, y and z inside a struct and make it atomic is not feasible here, as the struct doesn't fit inside a single 64-bit register.

Even if they would fit, you would still need to do a RMW operation, since it is unlikely you have atomic multiplication anyway.

Is this problem solvable only with a mutex?

If your architecture supports a lock-free 64-bit atomic (check with is_always_lock_free), you can keep both x and y together and perform operations on it as needed.

what if the variables were uint64_t instead, or the operation was way more complex like x * y * w * k / j?

Assuming that your architecture does not have 128-bit lock-free atomics, then you cannot load that much data atomically. Either design your program so that it does not need the (complete) operation to be atomic to begin with, use locks or seek a way to avoid sharing the state.

Note that even if you perceive some operations as atomic, you have to realize that in an SMP system you are putting pressure on the cache hierarchy anyway.

like image 155
Acorn Avatar answered Nov 11 '22 16:11

Acorn