Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

std::mutex::try_lock spuriously fail?

Tags:

c++

mutex

locking

Maybe I'm misunderstanding about std::mutex::try_lock:

This function is allowed to fail spuriously and return false even if the mutex is not currently locked by any other thread.

This means that if no one thread has a lock on that mutex, when I try a try_lock it could return false? For what purpose?

Isn't the function of try_lock return false if its locked OR true if nobody lock it? Not really sure if my non-native english is fooling me...

like image 563
markzzz Avatar asked Jan 17 '18 15:01

markzzz


People also ask

What is Try_lock C++?

The function calls the try_lock member function for each argument (first a , then b , and eventually the others in cde , in the same order), until either all calls are successful, or as soon as one of the calls fails (either by returning false or throwing an exception).

How do I lock my mutex?

Use pthread_mutex_lock(3THR) to lock the mutex pointed to by mutex . When pthread_mutex_lock() returns, the mutex is locked and the calling thread is the owner. If the mutex is already locked and owned by another thread, the calling thread blocks until the mutex becomes available.

Are mutexes fair?

On windows e.g. mutexes are mostly fair, but not always. Some implementations e.g. Thread Building Block provide special mutexes that are fair, but these are not based on the OSes native mutexes, and are usually implemented as spin-locks (which have their own caveats).

How do I use mutex in CPP?

A mutex is initialized in the beginning of the main function. The same mutex is locked in the 'trythis()' function while using the shared resource 'counter'. At the end of the function 'trythis()' the same mutex is unlocked. At the end of the main function when both the threads are done, the mutex is destroyed.


4 Answers

This means that if no one thread has a lock of that mutex, when I try a try_lock, it could return false?

Yes, that's exactly what it says.

Isn't the function of try_lock return false if its locked OR true if nobody lock it?

No, the function of try_lock is to try to lock the mutex.

However, there is more than one way it can fail:

  1. the mutex is already locked elsewhere (this is the one you're thinking of)
  2. some platform-specific feature interrupts or prevents the locking attempt, and control is returned to the caller who can decide whether to retry.

The common case on POSIX-ish platforms, and inherited from POSIX threads, is that a signal is delivered to (and handled by a signal handler in) the current thread, interrupting the lock attempt.

There may be other platform-specific reasons on other platforms, but the behaviour is the same.

like image 122
Useless Avatar answered Oct 05 '22 09:10

Useless


Based on your comments, I would write (quoting your words):

std::unique_lock<std::mutex> lock(m, std::defer_lock); // m being a mutex
...
if (lock.try_lock()) {
  ... // "DO something if nobody has a lock"
} else {
  ... // "GO AHEAD"
}

Note that lock.try_lock() effectively calls m.try_lock(), therefore it is prone to spurious fail as well. But I wouldn't care much about this issue. IMO, in practice, spurious fails/wakeups are quite rare (as Useless pointed out, on Linux, they can happen when a signal is delivered).

More about spurious issues, see e.g.: https://en.wikipedia.org/wiki/Spurious_wakeup or Why does pthread_cond_wait have spurious wakeups?.

UPDATE

If you really want to eliminate spurious fail of try_lock, you can use some atomic flag such as:

// shared by threads:
std::mutex m;  
std::atomic<bool> flag{false};

// within threads:
std::unique_lock<std::mutex> lock(m, std::defer_lock); // m being a mutex
...
while (true) {
  lock.try_lock();
  if (lock.owns_lock()) {
    flag = true;
    ... // "DO something if nobody has a lock"    
    flag = false;
    break;
  } else if (flag == true) {
    ... // "GO AHEAD"
    break;
  }
}

It may be possibly rewritten to better form, I didn't check. Also, note that flag is not automatically unset via RAII, some scope guard may be useful here.

UPDATE 2

If you do not need also the blocking functionality of mutex, use std::atomic_flag:

std::atomic_flag lock = ATOMIC_FLAG_INIT;

// within threads:
if (lock.test_and_set()) {
    ... // "DO something if nobody has a lock"    
    lock.clear();
} else {
    ... // "GO AHEAD"
}

Just, again, clearing the flag would be better via some RAII mechanism.

like image 43
Daniel Langr Avatar answered Oct 05 '22 09:10

Daniel Langr


Unlike was said there, I don't think there is any reason for a try_lock function to failed due to OS-related reasons: such operation is non-blocking, so signals cannot really interrupt it. Most likely it has everything to do with how this function is implemented on CPU-level. After all, uncontested case is usually the most interesting one for a mutex.

Mutex locking usually requires some form of atomic compare exchange operation. C++11 and C11 introduce atomic_compare_exchange_strong and atomic_compare_exchange_weak. The latter is allowed to fail spuriously.

By allowing try_lock to fail spuriously, implementations are allowed to use atomic_compare_exchange_weak to maximize performance and minimize code size.

For example on ARM64 atomic operations are usually implemented using exclusive-load (LDXR) and exclusive-store (STRX) instructions. LDXR fires up "monitor" hardware which starts tracking all accesses to a memory region. STRX only performs the store if no accesses to that region were made between LDXR and STRX instructions. Thus the whole sequence can fail spuriously if another thread accesses that memory region or if there was an IRQ in between the two.

In practice, code generate for try_lock implemented using weak guarantee is not very different from the one implemented using strong guarantee.

bool mutex_trylock_weak(atomic_int *mtx)
{
    int old = 0;
    return atomic_compare_exchange_weak(mtx, &old, 1);
}

bool mutex_trylock_strong(atomic_int *mtx)
{
    int old = 0;
    return atomic_compare_exchange_strong(mtx, &old, 1);
}

Take a look at generated assembly for ARM64:

mutex_trylock_weak:
  sub sp, sp, #16
  mov w1, 0
  str wzr, [sp, 12]
  ldaxr w3, [x0]      ; exclusive load (acquire)
  cmp w3, w1
  bne .L3
  mov w2, 1
  stlxr w4, w2, [x0]  ; exclusive store (release)
  cmp w4, 0           ; the only difference is here
.L3:
  cset w0, eq
  add sp, sp, 16
  ret

mutex_trylock_strong:
  sub sp, sp, #16
  mov w1, 0
  mov w2, 1
  str wzr, [sp, 12]
.L8:
  ldaxr w3, [x0]      ; exclusive load (acquire)
  cmp w3, w1
  bne .L9
  stlxr w4, w2, [x0]  ; exclusive store (release)
  cbnz w4, .L8        ; the only difference is here
.L9:
  cset w0, eq
  add sp, sp, 16
  ret

The only difference is that "weak" version eliminates conditional backward branch cbnz w4, .L8 and replaces it with cmp w4, 0. Backward condition branches are predicted by CPU as "will-be-taken" in the absense of branch prediction information as they are assumed to be part of a loop - such assumption is wrong in this case as most of the time lock will be acquired (low contention is assumed to be the most common case).

Imo this is the only performance difference between those functions. "Strong" version can basically suffer from 100% branch misprediction ratio on single instruction under some workloads.

By the way, ARMv8.1 introduces atomic instructions, so there is no difference between the two, just like on x86_64. Code generated with -march=armv8.1-a flag:

  sub sp, sp, #16
  mov w1, 0
  mov w2, 1
  mov w3, w1
  str wzr, [sp, 12]
  casal w3, w2, [x0]
  cmp w3, w1
  cset w0, eq
  add sp, sp, 16
  ret

Some try_lock functions can fail even when atomic_compare_exchange_strong is used, for example try_lock_shared of shared_mutex might need to increment reader counter and might fail if another reader has entered the lock. "Strong" variant of such function would need to generate a loop and thus can suffer from a similar branch mispredication.

Another minor detail: if mutex is written in C, some compilers (like Clang) might align loop at 16-byte boundary to improve its performance, bloating function body with padding. This is unnecessary if loop almost always runs a single time.


Another reason for spurious failure is a failure to acquire internal mutex lock (if mutex is implemented using a spinlock and some kernel primitive). In theory the same principle could be acquired in the kernel implementation of try_lock, although this does not seems reasonable.

like image 37
StaceyGirl Avatar answered Oct 05 '22 09:10

StaceyGirl


In paper Foundations of the C++ Concurrency Memory Model Section 3, there is already a clear explanation for why the standard allows spurious failures of try_lock. In short, it is specified to make the semantics of try_lock be consistent with the definition of race in C++ memory model.

like image 41
chain ro Avatar answered Oct 05 '22 07:10

chain ro