Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wait for a detached thread to finish in C++

How can I wait for a detached thread to finish in C++?

I don't care about an exit status, I just want to know whether or not the thread has finished.

I'm trying to provide a synchronous wrapper around an asynchronous thirdarty tool. The problem is a weird race condition crash involving a callback. The progression is:

  1. I call the thirdparty, and register a callback
  2. when the thirdparty finishes, it notifies me using the callback -- in a detached thread I have no real control over.
  3. I want the thread from (1) to wait until (2) is called.

I want to wrap this in a mechanism that provides a blocking call. So far, I have:

class Wait {
  public:
  void callback() {
    pthread_mutex_lock(&m_mutex);
    m_done = true;
    pthread_cond_broadcast(&m_cond);
    pthread_mutex_unlock(&m_mutex);
  }

  void wait() {
    pthread_mutex_lock(&m_mutex);
    while (!m_done) {
      pthread_cond_wait(&m_cond, &m_mutex);
    }
    pthread_mutex_unlock(&m_mutex);
  }

  private:
  pthread_mutex_t m_mutex;
  pthread_cond_t  m_cond;
  bool            m_done;
};

// elsewhere...
Wait waiter;
thirdparty_utility(&waiter);
waiter.wait();

As far as I can tell, this should work, and it usually does, but sometimes it crashes. As far as I can determine from the corefile, my guess as to the problem is this:

  1. When the callback broadcasts the end of m_done, the wait thread wakes up
  2. The wait thread is now done here, and Wait is destroyed. All of Wait's members are destroyed, including the mutex and cond.
  3. The callback thread tries to continue from the broadcast point, but is now using memory that's been released, which results in memory corruption.
  4. When the callback thread tries to return (above the level of my poor callback method), the program crashes (usually with a SIGSEGV, but I've seen SIGILL a couple of times).

I've tried a lot of different mechanisms to try to fix this, but none of them solve the problem. I still see occasional crashes.

EDIT: More details:

This is part of a massively multithreaded application, so creating a static Wait isn't practical.

I ran a test, creating Wait on the heap, and deliberately leaking the memory (i.e. the Wait objects are never deallocated), and that resulted in no crashes. So I'm sure it's a problem of Wait being deallocated too soon.

I've also tried a test with a sleep(5) after the unlock in wait, and that also produced no crashes. I hate to rely on a kludge like that though.

EDIT: ThirdParty details:

I didn't think this was relevant at first, but the more I think about it, the more I think it's the real problem:

The thirdparty stuff I mentioned, and why I have no control over the thread: this is using CORBA.

So, it's possible that CORBA is holding onto a reference to my object longer than intended.

like image 429
Tim Avatar asked Nov 15 '09 02:11

Tim


People also ask

What happens when a thread is detached?

Detaching Threads Separates the thread of execution from the thread object, allowing execution to continue independently. Any allocated resources will be freed once the thread exits.

What is a detached thread in C?

The pthread_detach() function marks the thread identified by thread as detached. When a detached thread terminates, its resources are automatically released back to the system without the need for another thread to join with the terminated thread.

Can a detached thread be joined?

5.6 [ISO/IEC 9899:2011], states that a thread shall not be joined once it was previously joined or detached. Similarly, subclause 7.26.

How do you stop a detached thread in C++?

Try TerminateThread() if it's Windows.


1 Answers

Yes, I believe that what you're describing is happening (race condition on deallocate). One quick way to fix this is to create a static instance of Wait, one that won't get destroyed. This will work as long as you don't need to have more than one waiter at the same time.

You will also permanently use that memory, it will not deallocate. But it doesn't look like that's too bad.

The main issue is that it's hard to coordinate lifetimes of your thread communication constructs between threads: you will always need at least one leftover communication construct to communicate when it is safe to destroy (at least in languages without garbage collection, like C++).

EDIT: See comments for some ideas about refcounting with a global mutex.

like image 138
Adam Goode Avatar answered Oct 06 '22 01:10

Adam Goode