Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does pthread_exit() in rare cases cause a SEGV when called after pthread_detach()?

I am getting a SEGV in C++ that I cannot easily reproduce (it occurs in about one in 100,000 test runs) in my call to pthread_join() as my application is shutting down. I checked the value of errno and it is zero. This is running on Centos v4.

Under what conditions would pthread_join() get a SEGV? This might be some kind of race condition since it is extremely rare. One person suggests I should not be calling pthread_detach() and pthread_exit(), but I am not clear on why.

My first working hypothesis was that pthread_join() is being called while pthread_exit() is still running in the other thread and that this somehow leads to a SEGV, but many have stated this is not an issue.

The failing code getting SEGV in the main thread during application exit looks roughly like this (with error return code checking omitted for brevity):

// During application startup, this function is called to create the child thread:

return_val = pthread_create(&_threadId, &attr,
                            (void *(*)(void *))initialize,
                            (void *)this);

// Apparently this next line is the issue:
return_val = pthread_detach(_threadId);

// Later during exit the following code is executed in the main thread:

// This main thread waits for the child thread exit request to finish:

// Release condition so child thread will exit:
releaseCond(mtx(), startCond(), &startCount);

// Wait until the child thread is done exiting so we don't delete memory it is
// using while it is shutting down.
waitOnCond(mtx(), endCond(), &endCount, 0);
// The above wait completes at the point that the child thread is about
// to call pthread_exit().

// It is unspecified whether a thread that has exited but remains unjoined
// counts against {PTHREAD_THREADS_MAX}, hence we must do pthread_join() to
// avoid possibly leaking the threads we destroy.
pthread_join(_threadId, NULL); // SEGV in here!!!

The child thread which is being joined on exit runs the following code which begins at the point above where releaseCond() is called in the main thread:

// Wait for main thread to tell us to exit:
waitOnCond(mtx(), startCond(), &startCount);

// Tell the main thread we are done so it will do pthread_join():
releaseCond(mtx(), endCond(), &endCount);
// At this point the main thread could call pthread_join() while we 
// call pthread_exit().

pthread_exit(NULL);

The thread appeared to come up properly and no error codes were produced during its creation during application startup and the thread performed its task correctly which took around five seconds before the application exited.

What might cause this rare SEGV to occur and how might I program defensively against it. One claim is that my call to pthread_detach() is the issue, if so, how should my code be corrected.

like image 676
WilliamKF Avatar asked Oct 07 '22 06:10

WilliamKF


1 Answers

Assuming:

  1. pthread_create returns zero (you are checking it, right?)
  2. attr is a valid pthread_attr_t object (How are you creating it? Why not just pass NULL instead?)
  3. attr does not specify that the thread is to be created detached
  4. You did not call pthread_detach or pthread_join on the thread somewhere else

...then it is "impossible" for pthread_join to fail, and you either have some other memory corruption or a bug in your runtime.

[update]

The RATIONALE section for pthread_detach says:

The *pthread_join*() or *pthread_detach*() functions should eventually be called for every thread that is created so that storage associated with the thread may be reclaimed.

Although it does not say these are mutually exclusive, the pthread_join documentation specifies:

The behavior is undefined if the value specified by the thread argument to *pthread_join*() does not refer to a joinable thread.

I am having trouble finding the exact wording that says a detached thread is not joinable, but I am pretty sure it is true.

So, either call pthread_join or pthread_detach, but not both.

like image 97
Nemo Avatar answered Oct 13 '22 09:10

Nemo