Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is there no wait function for condition_variable which does not relock the mutex

Consider the following example.

std::mutex mtx;
std::condition_variable cv;

void f()
{
  {
    std::unique_lock<std::mutex>  lock( mtx );
    cv.wait( lock );  // 1
  }
  std::cout << "f()\n";
}

void g()
{
  std::this_thread::sleep_for( 1s );
  cv.notify_one();
}

int main()
{
  std::thread  t1{ f };
  std::thread  t2{ g };
  t2.join();
  t1.join();
}

g() "knows" that f() is waiting in the scenario I would like to discuss. According to cppreference.com there is no need for g() to lock the mutex before calling notify_one. Now in the line marked "1" cv will release the mutex and relock it once the notification is sent. The destructor of lock releases it again immediately after that. This seems to be superfluous especially since locking is expensive. (I know in certain scenarios the mutex needs to be locked. But this is not the case here.)

Why does condition_variable have no function "wait_nolock" which does not relock the mutex once the notification arrives. If the answer is that pthreads do not provide such functionality: Why can`t pthreads be extended for providing it? Is there an alternative for realizing the desired behavior?

like image 922
Claas Bontus Avatar asked Oct 06 '15 19:10

Claas Bontus


1 Answers

You misunderstand what your code does.

Your code on line // 1 is free to not block at all. condition_variables can (and will!) have spurious wakeups -- they can wake up for no good reason at all.

You are responsible for checking if the wakeup is spurious.

Using a condition_variable properly requires 3 things:

  • A condition_variable
  • A mutex
  • Some data guarded by the mutex

The data guarded by the mutex is modified (under the mutex). Then (with the mutex possibly disengaged), the condition_variable is notified.

On the other end, you lock the mutex, then wait on the condition variable. When you wake up, your mutex is relocked, and you test if the wakeup is spurious by looking at the data guarded by the mutex. If it is a valid wakeup, you process and proceed.

If it wasn't a valid wakeup, you go back to waiting.

In your case, you don't have any data guarded, you cannot distinguish spurious wakeups from real ones, and your design is incomplete.

Not surprisingly with the incomplete design you don't see the reason why the mutex is relocked: it is relocked so you can safely check the data to see if the wakeup was spurious or not.

If you want to know why condition variables are designed that way, probably because this design is more efficient than the "reliable" one (for whatever reason), and rather than exposing higher level primitives, C++ exposed the lower level more efficient primitives.

Building a higher level abstraction on top of this isn't hard, but there are design decisions. Here is one built on top of std::experimental::optional:

template<class T>
struct data_passer {
  std::experimental::optional<T> data;
  bool abort_flag = false;
  std::mutex guard;
  std::condition_variable signal;

  void send( T t ) {
    {
      std::unique_lock<std::mutex> _(guard);
      data = std::move(t);
    }
    signal.notify_one();
  }
  void abort() {
    {
      std::unique_lock<std::mutex> _(guard);
      abort_flag = true;
    }
    signal.notify_all();
  }        
  std::experimental::optional<T> get() {
    std::unique_lock<std::mutex> _(guard);
    signal.wait( _, [this]()->bool{
      return data || abort_flag;
    });
    if (abort_flag) return {};
    T retval = std::move(*data);
    data = {};
    return retval;
  }
};

Now, each send can cause a get to succeed at the other end. If more than one send occurs, only the latest one is consumed by a get. If and when abort_flag is set, instead get() immediately returns {};

The above supports multiple consumers and producers.

An example of how the above might be used is a source of preview state (say, a UI thread), and one or more preview renderers (which are not fast enough to be run in the UI thread).

The preview state dumps a preview state into the data_passer<preview_state> willy-nilly. The renderers compete and one of them grabs it. Then they render it, and pass it back (through whatever mechanism).

If the preview states come faster than the renderers consume them, only the most recent one is of interest, so the earlier ones are discarded. But existing previews aren't aborted just because a new state shows up.


Questions where asked below about race conditions.

If the data being communicated is atomic, can't we do without the mutex on the "send" side?

So something like this:

template<class T>
struct data_passer {
  std::atomic<std::experimental::optional<T>> data;
  std::atomic<bool> abort_flag = false;
  std::mutex guard;
  std::condition_variable signal;

  void send( T t ) {
    data = std::move(t); // 1a
    signal.notify_one(); // 1b
  }
  void abort() {
    abort_flag = true;   // 1a
    signal.notify_all(); // 1b
  }        
  std::experimental::optional<T> get() {
    std::unique_lock<std::mutex> _(guard); // 2a
    signal.wait( _, [this]()->bool{ // 2b
      return data.load() || abort_flag.load(); // 2c
    });
    if (abort_flag.load()) return {};
    T retval = std::move(*data.load());
    // data = std::experimental::nullopt;  // doesn't make sense
    return retval;
  }
};

the above fails to work.

We start with the listening thread. It does step 2a, then waits (2b). It evaluates the condition at step 2c, but doesn't return from the lambda yet.

The broadcasting thread then does step 1a (setting the data), then signals the condition variable. At this moment, nobody is waiting on the condition variable (the code in the lambda doesn't count!).

The listening thread then finishes the lambda, and returns "spurious wakeup". It then blocks on the condition variable, and never notices that data was sent.

The std::mutex used while waiting on the condition variable must guard the write to the data "passed" by the condition variable (whatever test you do to determine if the wakeup was spurious), and the read (in the lambda), or the possibility of "lost signals" exists. (At least in a simple implementation: more complex implementations can create lock-free paths for "common cases" and only use the mutex in a double-check. This is beyond the scope of this question.)

Using atomic variables does not get around this problem, because the two operations of "determine if the message was spurious" and "rewait in the condition variable" must be atomic with regards to the "spuriousness" of the message.

like image 174
Yakk - Adam Nevraumont Avatar answered Oct 16 '22 13:10

Yakk - Adam Nevraumont