Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is gettimeofday async signal safe ? and can it cause deadlock if used in signal handler?

Previously I had asked a question regarding how to terminate thread blocked for I/O. I have used pthread_kill() instead of pthread_cancel() or writing to pipes, considering few advantages. After implementing the code to send signal (SIGUSR2) to the target thread using pthread_kill(), initially it worked fine and didn't face any issues.

static void signalHandler(int signum) {
    LogInfo("SIGNAL HANDLER : %d",signum);  //print info message using logger utility
}

void setSignalHandler() {
    struct sigaction actions;

    memset(&actions, 0, sizeof(actions));
    sigemptyset(&actions.sa_mask);
    actions.sa_flags = 0;
    actions.sa_handler = signalHandler;

    int rc = sigaction(SIGUSR2,&actions,NULL);
    if(rc < 0) {
        LogError("Error in sigaction");  //print error using logger utility
    }
}

void stopThread() {
    fStop = 1;  //set the flag to stop the thread
    if( ftid ) { //ftid is pthread_t variable of thread that needs to be terminated.
        LogInfo("Before pthread_kill()");
        int kill_result = pthread_kill(ftid,SIGUSR2);  // send signal to interrupt the thread if blocked for poll() I/O
        LogInfo("pthread_kill() returned : %d ( %s )",kill_result,strerror(kill_result));
        int result = pthread_join( ftid, NULL );
        LogInfo("Event loop stopped for %s", fStreamName);
        LogInfo( "pthread_join() -> %d ( %s )\n", result, strerror(result) );
        if( result == 0 ) ftid = 0;
    }
}

Recently I have noticed a deadlock issue where few threads (which tried to log statements) blocked with below stacktrace

#0  0x00007fc1b2dc565c in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007fc1b2d70f0c in _L_lock_2462 () from /lib64/libc.so.6
#2  0x00007fc1b2d70d47 in __tz_convert () from /lib64/libc.so.6
#3  0x00000000004708f7 in Logger::getCurrentTimestamp(char*) ()

getCurrentTimestamp(char*) function is called from our logger module when LogInfo() or LogError() functions are invoked. This function internally calls gettimeofday() to print the current time in logs. So I'm suspecting LogInfo() function (which calls gettimeofday()) called inside signalHandler() might be causing the deadlock issue. But i'm not confident of this as this issue is not getting reproduced frequently and i did not get any information stating that gettimeofday() is not async signal safe.

gettimeofday() is thread safe and i'm considering it's not async signal safe as it was not listed under async signal safe functions.

1) Can I consider gettimeofday() is not async signal safe and it was the root cause of deadlock ?

2) Are there any known issues where use of pthread_kill() might cause deadlock ?

EDIT :

Below is the definition of getCurrentTimeStamp() function

char* Logger::getCurrentTimestamp ( char *tm_buff ) {
    char curr_time[ 32 ] = { 0 };
    time_t current = time( NULL );
    struct timeval detail_time;

    if ( tm_buff ) {
        strftime( curr_time, sizeof(curr_time), "%Y-%m-%d-%H:%M:%S", localtime( &current ) );
        gettimeofday( &detail_time, NULL );
        sprintf( tm_buff, "%s:%03d", curr_time, (int) (detail_time.tv_usec / 1000) );
        return (tm_buff);
    }
    return (NULL);
}

stack trace of the thread that was interrupted by signal.

#0  0x00007fc1b2dc565c in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007fc1b2d70f0c in _L_lock_2462 () from /lib64/libc.so.6
#2  0x00007fc1b2d70d47 in __tz_convert () from /lib64/libc.so.6
#3  0x00000000004708f7 in Logger::getCurrentTimestamp(char*) ()
#4  0x000000000046e80f in Log::write(Logger*, LogType, std::string const&, char const*) ()
#5  0x000000000046df74 in Log::i(Logger*, char const*, std::string const&, std::string const&, char const*, ...) ()
#6  0x0000000000456b67 in ?? ()
#7  <signal handler called>
#8  0x00007fc1b2da8dc5 in _xstat () from /lib64/libc.so.6
#9  0x00007fc1b2d71056 in __tzfile_read () from /lib64/libc.so.6
#10 0x00007fc1b2d703b4 in tzset_internal () from /lib64/libc.so.6
#11 0x00007fc1b2d70d63 in __tz_convert () from /lib64/libc.so.6
#12 0x00000000004708f7 in Logger::getCurrentTimestamp(char*) ()
#13 0x000000000047030e in Logger::writeLog(int, std::string&, std::string&) ()
#14 0x0000000000470633 in Logger::Write(Logger*, int, std::string, std::string const&) ()
#15 0x000000000046eb02 in Log::write(Logger*, LogType, std::string const&, char const*) ()
#16 0x000000000046df74 in Log::i(Logger*, char const*, std::string const&, std::string const&, char const*, ...) ()
like image 759
Durgesh Avatar asked Nov 05 '18 13:11

Durgesh


2 Answers

gettimeofday itself is not signal safe, as you already found out.

However, "POSIX.1-2008 marks gettimeofday() as obsolete, recommending the use of clock_gettime(2) instead."(source) and clock_gettime is signal-safe (and also thread-safe), so that might be an alternative for you.

like image 195
hellow Avatar answered Oct 21 '22 06:10

hellow


gettimeofday is not defined as async-signal-safe, but if you pass 0 for the second (timezone) argument, it doesn't have to do anything that time and clock_gettime (both of which are officially async-signal-safe) don't also do, so as a matter of QoI it should be async-signal-safe in that case.

Your deadlock is inside the internal libc function __tz_convert.

#2  0x00007fc1b2d70d47 in __tz_convert () from /lib64/libc.so.6
#3  0x00000000004708f7 in Logger::getCurrentTimestamp(char*) ()

It appears to have been called directly from Logger::getCurrentTimestamp, but that's because it was "tail called" from a documented API function. There are only four functions in GNU libc that call __tz_convert (I grepped the source code): localtime, localtime_r, gmtime, and gmtime_r. Therefore, your problem is not that you are calling gettimeofday, but that you are calling one of those functions.

localtime and gmtime are obviously not async-signal-safe since they write to a global variable. localtime_r and gmtime_r are not async-signal-safe either, because they have to look at the global database of timezone information (yes, even gmtime_r does this — it might be possible to change it not to need to do that, but it still wouldn't be a thing you could rely on cross-platform).

I don't think there's a good workaround. Formatted output from an async signal handler is going to trip over all kinds of other problems; my advice is to restructure your code so that you never need to call logging functions from async signal handlers.

like image 24
zwol Avatar answered Oct 21 '22 06:10

zwol