Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pthread_cond_timedwait does not return in GHC FFI

I've tried to implement Haskell Control.Concurrent.MVar that resides in shared memory and allows communicating between multiple independent processes/programs using POSIX functionality. But I have failed with lots of deadlocks.

The problem is that pthread_cond_timedwait sometimes does not return being called within GHC FFI (albeit interruptible or unsafe). After a few days of desperate attempts to resolve the problem, I decided to minify the code and ask community to help. Unfortunately, I could not condense the problem into a few lines of code pastable in here. Therefore, I stored the (as small as possible) code on github together with the instructions on how to replicate the problem here is a permalink to the current state of it (mvar-fail branch).

In the essence, the functions to take and put mvar look like this:

int mvar_take(MVar *mvar, ...) {
   pthread_mutex_timedlock(&(mvar->statePtr->mvMut), &timeToWait);
   while ( !(mvar->statePtr->isFull) ) {
     pthread_cond_signal(&(mvar->statePtr->canPutC));
     pthread_cond_timedwait(&(mvar->statePtr->canTakeC), &(mvar->statePtr->mvMut), &timeToWait);
   }
   memcpy(localDataPtr, mvar->dataPtr, mvar->statePtr->dataSize);
   mvar->statePtr->isFull = 0;
   pthread_mutex_unlock(&(mvar->statePtr->mvMut));
}

int mvar_put(MVar *mvar, ...) {
   pthread_mutex_timedlock(&(mvar->statePtr->mvMut), &timeToWait);
   while ( mvar->statePtr->isFull ) {
     pthread_cond_signal(&(mvar->statePtr->canTakeC));
     pthread_cond_timedwait(&(mvar->statePtr->canPutC), &(mvar->statePtr->mvMut), &timeToWait);
   }
   memcpy(mvar->dataPtr, localDataPtr, mvar->statePtr->dataSize);
   mvar->statePtr->isFull = 1;
   pthread_mutex_unlock(&(mvar->statePtr->mvMut));
}

(Plus error checking and printfs after every command). Full code for mvar_take. The initialization happens as follows:

pthread_mutexattr_init(&(s.mvMAttr));
pthread_mutexattr_settype(&(s.mvMAttr), PTHREAD_MUTEX_ERRORCHECK);
pthread_mutexattr_setpshared(&(s.mvMAttr), PTHREAD_PROCESS_SHARED);
pthread_mutex_init(&(s.mvMut), &(s.mvMAttr));
pthread_condattr_init(&(s.condAttr));
pthread_condattr_setpshared(&(s.condAttr), PTHREAD_PROCESS_SHARED);
pthread_cond_init(&(s.canPutC), &(s.condAttr));
pthread_cond_init(&(s.canTakeC), &(s.condAttr));

Full code. The Haskell part looks like this:

foreign import ccall interruptible "mvar_take"
  mvar_take :: Ptr StoredMVarT -> Ptr a -> CInt -> IO CInt
foreign import ccall interruptible "mvar_put"
  mvar_put :: Ptr StoredMVarT -> Ptr a -> CInt -> IO CInt

takeMVar :: Storable a => StoredMVar a -> IO a
takeMVar (StoredMVar _ fp) = withForeignPtr fp $ \p -> alloca $ \lp -> do
    r <- mvar_take p lp
    if r == 0
    then peek lp
    else throwErrno $ "takeMVar failed with code " ++ show r

putMVar :: Storable a => StoredMVar a -> a -> IO ()
putMVar (StoredMVar _ fp) x = withForeignPtr fp $ \p -> alloca $ \lp -> do
    poke lp x
    r <- mvar_put p lp
    unless (r == 0)
      $ throwErrno $ "putMVar failed with code " ++ show r

Full code. Changing FFI from interruptible to unsafe does not prevent the deadlock. Sometimes the deadlock happens every second run, sometimes it happens after 50 runs only (and the rest is executed as expected).

My guess is that GHC might interfere the work of POSIX mutexes with some OS signal handling, but I don't know GHC internals enough to verify it.

Is that me doing something stupidly wrong, or do I need to add some special tricks to make it work inside GHC FFI?

P.S.: the last version of README with my investigations is available at interprocess mvar-fail.

UPDATE 13.06.2018: I tried to temporarily block all OS signals by surrounding function code with following:

sigset_t mask, omask;
sigfillset(&mask);
sigprocmask(SIG_SETMASK, &mask, &omask);
...
sigprocmask(SIG_SETMASK, &omask, NULL);

This did not help.

like image 817
artem Avatar asked Jun 12 '18 05:06

artem


1 Answers

Well, as expected, this was my fault - a very C-beginner error. As one can see from the initialization snippet, I keep the mutex and the condition variables in a structure. What one cannot see from the snippet here, but can see by the links I gave (on github), is that I am copying that structure to a shared memory. Not only that is not allowed for mutexes, but I also stupidly copy it before I initialize everything in the structure.

That is, I just copied a C structure where I should have set a pointer.

The most surprising here is that the code still works sometimes. Here is the link to the erroneous code.

like image 88
artem Avatar answered Nov 12 '22 18:11

artem