I must be just having a moment, because this should be easy but I can't seem to get it working right. Whats the correct way to implement an atomic counter in GCC? i.e. I want a counter that runs from zero to 4 and is thread safe. I was doing this (which is further wrapped in a class, but not here) <pre class="prettyprint"><code>static volatile int _count = 0; const int limit = 4; int get_count(){ // Create a local copy of diskid int save_count = __sync_fetch_and_add(&_count, 1); if (save_count >= limit){ __sync_fetch_and_and(&_count, 0); // Set it back to zero } return save_count; } </code></pre> But it's running from 1 through from 1 - 4 inclusive then around to zero. It should go from 0 - 3. Normally I'd do a counter with a mod operator but I don't know how to do that safely. Perhaps this version is better. Can you see any problems with it, or offer a better solution. <pre class="prettyprint"><code>int get_count(){ // Create a local copy of diskid int save_count = _count; if (save_count >= limit){ __sync_fetch_and_and(&_count, 0); // Set it back to zero return 0; } return save_count; } </code></pre> Actually, I should point out that it's not absolutely critical that each thread get a different value. If two threads happened to read the same value at the same time that wouldn't be a problem. But they can't exceed limit at any time.

Your code isn't atomic (and your second <code>get_count</code> doesn't even increment the counter value)! Say <code>count</code> is 3 at the start and two threads simultaneously call <code>get_count</code>. One of them will get his atomic add done first and increments <code>count</code> to 4. If the second thread is fast enough, it can increment it to <code>5</code> before the first thread resets it to zero. Also, in your wraparound processing, you reset <code>count</code> to 0 but not <code>save_count</code>. This is clearly not what's intended. This is easiest if <code>limit</code> is a power of 2. Don't ever do the reduction yourself, just use <pre class="prettyprint"><code>return (unsigned) __sync_fetch_and_add(&count, 1) % (unsigned) limit; </code></pre> or alternatively <pre class="prettyprint"><code>return __sync_fetch_and_add(&count, 1) & (limit - 1); </code></pre> This only does one atomic operation per invocation, is safe and very cheap. For generic limits, you can still use <code>%</code>, but that will break the sequence if the counter ever overflows. You can try using a 64-bit value (if your platform supports 64-bit atomics) and just hope it never overflows; this is a bad idea though. The proper way to do this is using an atomic compare-exchange operation. You do this: <pre class="prettyprint"><code>int old_count, new_count; do { old_count = count; new_count = old_count + 1; if (new_count >= limit) new_count = 0; // or use % } while (!__sync_bool_compare_and_swap(&count, old_count, new_count)); </code></pre> This approach generalizes to more complicated sequences and update operations too. That said, this type of lockless operation is tricky to get right, relies on undefined behavior to some degree (all current compilers get this right, but no C/C++ standard before C++0x actually has a well-defined memory model) and is easy to break. I recommend using a simple mutex/lock unless you've profiled it and found it to be a bottleneck.

Atomic counter in gcc

Tags:

c++

gcc

I must be just having a moment, because this should be easy but I can't seem to get it working right.

Whats the correct way to implement an atomic counter in GCC?

i.e. I want a counter that runs from zero to 4 and is thread safe.

I was doing this (which is further wrapped in a class, but not here)

static volatile int _count = 0;
const int limit = 4;

int get_count(){
  // Create a local copy of diskid
  int save_count = __sync_fetch_and_add(&_count, 1);
  if (save_count >= limit){
      __sync_fetch_and_and(&_count, 0); // Set it back to zero
  }
  return save_count;
}

But it's running from 1 through from 1 - 4 inclusive then around to zero.
It should go from 0 - 3. Normally I'd do a counter with a mod operator but I don't know how to do that safely.

Perhaps this version is better. Can you see any problems with it, or offer a better solution.

int get_count(){
   // Create a local copy of diskid
   int save_count = _count;
   if (save_count >= limit){
      __sync_fetch_and_and(&_count, 0); // Set it back to zero
      return 0;
   }

   return save_count;
 }

Actually, I should point out that it's not absolutely critical that each thread get a different value. If two threads happened to read the same value at the same time that wouldn't be a problem. But they can't exceed limit at any time.

951

asked Nov 12 '10 03:11

hookenz

1 Answers

Your code isn't atomic (and your second get_count doesn't even increment the counter value)!

Say count is 3 at the start and two threads simultaneously call get_count. One of them will get his atomic add done first and increments count to 4. If the second thread is fast enough, it can increment it to 5 before the first thread resets it to zero.

Also, in your wraparound processing, you reset count to 0 but not save_count. This is clearly not what's intended.

This is easiest if limit is a power of 2. Don't ever do the reduction yourself, just use

return (unsigned) __sync_fetch_and_add(&count, 1) % (unsigned) limit;

or alternatively

return __sync_fetch_and_add(&count, 1) & (limit - 1);

This only does one atomic operation per invocation, is safe and very cheap. For generic limits, you can still use %, but that will break the sequence if the counter ever overflows. You can try using a 64-bit value (if your platform supports 64-bit atomics) and just hope it never overflows; this is a bad idea though. The proper way to do this is using an atomic compare-exchange operation. You do this:

int old_count, new_count;
do {
  old_count = count;
  new_count = old_count + 1;
  if (new_count >= limit) new_count = 0; // or use %
} while (!__sync_bool_compare_and_swap(&count, old_count, new_count));

This approach generalizes to more complicated sequences and update operations too.

That said, this type of lockless operation is tricky to get right, relies on undefined behavior to some degree (all current compilers get this right, but no C/C++ standard before C++0x actually has a well-defined memory model) and is easy to break. I recommend using a simple mutex/lock unless you've profiled it and found it to be a bottleneck.

141

answered Nov 07 '22 16:11

Fabian Giesen

Related questions
                            
                                Can a non-type template parameter be of type "void*"?
                            
                                Can span be constexpr?
                            
                                What is an equivalent replacement for std::unary_function in C++17?
                            
                                Write large file
                            
                                What is the optimization level ( g++ ) you use while comparing two different algorithms written in C++ ?
                            
                                building boost python examples using Visual Studio 2008
                            
                                2D vector modelling for game development
                            
                                What exactly is a PWSTR and why use this naming compared to char*, std::string, or CString in C++?
                            
                                C++ hooking a dll?
                            
                                How to convert between local and universal time using boost::date_time?
                            
                                why weak_ptr can break cyclic reference?
                            
                                Emacs, Cedet and semantic
                            
                                How do you implement syntax highlighting?
                            
                                gcc optimization? bug? and its practial implication to project
                            
                                Unresolved external symbols in compiling 32 bit application in Windows 64
                            
                                Will std::multimap preserve the insert order if the key of 2 elements equal to each other?
                            
                                Where to put BOOST_CLASS_EXPORT for boost::serialization?
                            
                                Choice of the most performant container (array)
                            
                                C++: error "explicit specialization in non-namespace scope"
                            
                                Is string::c_str() allowed to allocate anything on the heap?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With