Callables, forwarding, rvalues, and threads

Question

How do I forward a callable which might be a rvalue and its (variadic) arguments so execution is 100% correct and reliable in respect of a to-be-spawned thread?

My guess is that the answer will be "wait on a condition variable", but I'd like to be certain.

Why am I asking this?

I've had a homebrewn threading implementation which works 100% reliably for decades, long before std::thread existed. It still works fine. Now we do have std::thread for a good while (more or less, that is... MinGW-type distro support for threading isn't so great thanks to depending on gthr/pthreads, sadly, and I absolutely need to support Win32).

std::thread has this cool feature that you can pass a lambda, all along with arbitrary arguments, and it somehow just works. That's awesome.

My implementation requires you to either derive from a base class, or pass a pointer to a state block which contains a function pointer and a couple of data pointers (which is basically the same as a lambda, but ugly, clumsy, and less flexible).

I don't really need the lambda functionality, but it's quite cool, so I thought I'd just implement it. How difficult can it be.

The verdict

The code is straightforward and it works perfectly well, except, except it doesn't. I discovered that it doesn't work after many successful runs creating two threads with inline lambdas. In other words

class thread { [...]  template<typename F, typename... A> thread(F&& f, A&&... args) : handle( detail::spawn_thread(std::forward<>(...)) ) { } };

thread a([](){...}); // works fine 100% of the time alone, but
thread b([](){...}); // one failure out of 20, if both are present

Huh, that's funny. How can this be? There isn't even enough code to have the possibility of a serious failure. Universal references are forwarded, thread is spawned with pointer to params, everything looks perfectly innocent.

Except, well except, if the other thread has not yet started by the time spawn_thread returns (which apparently happens about once in 20 times on my system). Because in that case, the new thread will attempt to read state which you've just released and overwritten.

The standard library approach

Let's see how the standard library does it!

Funnily, looking at the standard implementation (the gthr files are missing on my no-thread-available MinGW-w64, but they can be found on github) it turns out they have almost exactly the same code. Except I'm using single-character template parameters without underscores and fewer typedefs, and the standard library uses dynamic allocation.

Oh wait, dynamic allocation for the state, that's it! How bloody obvious. Let's cheat and look what exactly they're doing. (I've edited the code to make it more easily readable, removed error checking and obscuring typedefs, functionality is the same).

class thread
{
...
    struct _State { virtual void _M_run() = 0; };

    template<typename _Callable> struct _State_impl : public _State
    {
        _Callable       _M_func;
        _State_impl(_Callable&& __f) : _M_func(std::forward<_Callable>(__f)) { }
        void _M_run() { _M_func(); }
    };

    template<typename _Callable, typename... _Args> explicit thread(_Callable&& __f, _Args&&... __args)
    {
        _M_start_thread(_S_make_state(std::__bind_simple(std::forward<_Callable>(__f), std::forward<_Args>(__args)...)));
    //  _M_start_thread( unique_ptr<blah> (new blah(std::forward<>(blah))) );  // ---> OK, store state, and pass to _M_start_thread
    }

    void _M_start_thread(unique_ptr<_State> state, void (*)())
    {
        __gthread_create( ...  &execute_native_thread_routine, state.get()); // __gthread_create ==  phtread_create
        state.release();
    }
};

where:

    extern "C"
     {
        static void* execute_native_thread_routine(void* __p)
        {
          thread::_State_ptr __t { static_cast<thread::_State*>(__p) };
          __t->_M_run(); // courageous!
        }
    }

Obviously, the standard implementation is correct. If you had a 5% chance of your programs running amok, one of the several million daily users would have noticed long ago.

However, I don't understand why it is correct, or how. If you had the guarantee that the freshly spawned thread runs before the original thread returns, that would of course work. But to my knowledge, neither pthreads nor Win32, nor any other threading API provides that guarantee.

To me the approach looks like:

dynamically allocate state, assign to unique_ptr
pass raw pointer to state to pthread_create
~~release unique_ptr, deallocate state~~ This is actually why the standard implementation "works". It deliberately leaks the state (thanks for T.C. for clearing up that I confused release with reset).
eventually run execute_native_thread_routine (different thread)
~~pray~~

While I see no other reliable solution than:

dynamically allocate state
pass pointer to pthread_create
collect underpants
???
release state safely
Profit!!!

It seems as if something like waiting on an eventcount / semaphore / condition variable (whatever is available) would be necessary to be sure the thread has started, does it not? Of course that makes spawning threads painstakingly inefficient.

One might think that a shared_ptr would do the trick. But how do you successfully pass a shared_ptr through a system library interface that takes a void* and passes that raw pointer to a different thread? This isn't going to fly.

I almost feel like just leaking the state -- threads are not created in hundreds, and a few dozen bytes leaked are probably not noticeable. But then, automated analysis tools will carp at it, and if there's ever an audit, the auditor will be so very, very smart as to point out this very dangerous leak.

I had thought about alternatively just storing a std::function inside the thread class (seems reasonable, as the thread object usually lives for as long as the thread). But std::function wants to know a type that you don't have at class scope...

Is there a better (correct, and reliable) way of doing this, that is without possibly losing something on the way?

Damon · Accepted Answer

After spending half an hour on writing this question, I think I just figured out the embarrassingly simple "Duh!" type of answer for the problem myself:

Do not release the state in the calling thread at all. Instead, do it in the freshly created thread.

Since the new thread must (quite obviously) be running in order to delete the state, no further synchronization is necessary, nor are any passing around smart pointers.

For exception safety in case the lambda throws, it however doesn't hurt to wrap the raw pointer into a unique_ptr on the new-thread side, something like this:

void* thr_prc(void* st) { std::unique_ptr<call_base>{static_cast<call_base*>(st)}->exec(); }

Callables, forwarding, rvalues, and threads

Tags:

c++

c++14

Why am I asking this?

The verdict

The standard library approach

Damon

1 Answers

Damon

Recent Activity

Donate For Us

Callables, forwarding, rvalues, and threads

Tags:

c++

c++14

Why am I asking this?

The verdict

The standard library approach

Damon

1 Answers

Damon

Related questions

Recent Activity

Donate For Us