Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pthread_join crashes intermittently with segmentation fault on OSX

I'm getting a segmentation fault while joining on a child thread and I've exhausted all options I could think of debugging, looking on Stack-overflow and the rest of the Internet! :) I'll be as thorough as I can. The code is written in C++ and compiled with GNU GCC on OSX 10.6.8. I've linked in the 'pthread' library using the '-pthread' parameter. I've also tried '-lphtread'. No difference.

I'm using the following global variables:

pthread_t gTid;

pthread_attr_t gAttr;

int gExitThread = 0;

I'm creating a child thread from my main thread of execution:

err = pthread_attr_init(&gAttr);
if (err)
{
    throw CONTROLLER_THREAD_ERROR;
}

err = pthread_attr_setdetachstate(&gAttr, PTHREAD_CREATE_JOINABLE);
if (err)
{
    throw CONTROLLER_THREAD_ERROR;
}

err = pthread_create(&gTid,&gAttr,threadHandler,NULL);
if (err)
{
    throw CONTROLLER_THREAD_ERROR;
}

Inside 'threadHandler', I have the following run loop using the core foundation API:

// Enter run loop
result = CFRunLoopRunInMode(kCFRunLoopDefaultMode, RUN_LOOP_TIMEOUT, false);
while (result == kCFRunLoopRunTimedOut)
{
    if (gExitThread) break;
    result = CFRunLoopRunInMode(kCFRunLoopDefaultMode, RUN_LOOP_TIMEOUT, false);
}

The gExitThread global variable is used to signal that the thread should gracefully kill itself. The RUN_LOOP_TIMEOUT macro is set to 2 seconds (although larger and smaller values make no difference).

The thread is signalled to be killed by the following piece of code in the main thread:

int err = 0;
void* exitValue = NULL;

printf("Stopping controller thread...\n");

gExitThread = 1;
err = pthread_join(gTid, &exitValue);
if (err)
{
    displayError2(err);
    throw CONTROLLER_THREAD_ERROR;
}

err = pthread_attr_destroy(&gAttr);
if (err)
{
    throw CONTROLLER_THREAD_ERROR;
}

The call to 'pthread_join' crashes with a segmentation fault after a short delay. I've also noticed that replacing the call of 'pthread_join' with a normal sleep of let's say two seconds, causes the exact same segmentation fault when executing 'usleep(2000000)'! I'll copy the back trace of the core dump below for both 'pthread_join' and 'usleep'.

pthread_join:

#0  0x00007fff8343aa6a in __semwait_signal ()
#1  0x00007fff83461896 in pthread_join ()
#2  0x000000010000179d in Controller::cleanup () at src/native/osx/controllers.cpp:335
#3  0x0000000100008e51 in ControllersTest::performTest (this=0x100211bf0) at unittests/src/controllers_test.cpp:70
#4  0x000000010000e5b9 in main (argc=2, argv=0x7fff5fbff980) at unittests/src/verify.cpp:34

usleep(2000000):

#0  0x00007fff8343aa6a in __semwait_signal ()
#1  0x00007fff8343a8f9 in nanosleep ()
#2  0x00007fff8343a863 in usleep ()
#3  0x000000010000177b in Controller::cleanup () at src/native/osx/controllers.cpp:335
#4  0x0000000100008e3d in ControllersTest::performTest (this=0x100211bf0) at unittests/src/controllers_test.cpp:70
#5  0x000000010000e5a5 in main (argc=2, argv=0x7fff5fbff980) at unittests/src/verify.cpp:34

Any help will be greatly appreciated.

like image 537
lawrenceB Avatar asked Oct 07 '11 10:10

lawrenceB


1 Answers

It seems that the code after your while loop inside the threadHandler is causing a segfault. If a signal is generated (e.g. SIGSEGV) inside a thread, the process itself will get killed.

Try using GDB and thread apply all bt in order to get backtrace for all threads.

like image 83
Milan Avatar answered Oct 21 '22 11:10

Milan