Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I recover a semaphore when the process that decremented it to zero crashes?

I have multiple apps compiled with g++, running in Ubuntu. I'm using named semaphores to co-ordinate between different processes.

All works fine except in the following situation: If one of the processes calls sem_wait() or sem_timedwait() to decrement the semaphore and then crashes or is killed -9 before it gets a chance to call sem_post(), then from that moment on, the named semaphore is "unusable".

By "unusable", what I mean is the semaphore count is now zero, and the process that should have incremented it back to 1 has died or been killed.

I cannot find a sem_*() API that might tell me the process that last decremented it has crashed.

Am I missing an API somewhere?

Here is how I open the named semaphore:

sem_t *sem = sem_open( "/testing",     O_CREAT     |   // create the semaphore if it does not already exist     O_CLOEXEC   ,   // close on execute     S_IRWXU     |   // permissions:  user     S_IRWXG     |   // permissions:  group     S_IRWXO     ,   // permissions:  other     1           );  // initial value of the semaphore 

Here is how I decrement it:

struct timespec timeout = { 0, 0 }; clock_gettime( CLOCK_REALTIME, &timeout ); timeout.tv_sec += 5;  if ( sem_timedwait( sem, &timeout ) ) {     throw "timeout while waiting for semaphore"; } 
like image 679
Stéphane Avatar asked Jan 13 '10 01:01

Stéphane


People also ask

Can Semaphore be used across processes?

If you specify a non-zero value for the pshared argument, the semaphore can be shared between processes. If you specify the value zero, the semaphore can be shared among threads of the same process. The sem_open function establishes a connection between a named semaphore and the calling process.


1 Answers

Turns out there isn't a way to reliably recover the semaphore. Sure, anyone can post_sem() to the named semaphore to get the count to increase past zero again, but how to tell when such a recovery is needed? The API provided is too limited and doesn't indicate in any way when this has happened.

Beware of the ipc tools also available -- the common tools ipcmk, ipcrm, and ipcs are only for the outdated SysV semaphores. They specifically do not work with the new POSIX semaphores.

But it looks like there are other things that can be used to lock things, which the operating system does automatically release when an application dies in a way that cannot be caught in a signal handler. Two examples: a listening socket bound to a particular port, or a lock on a specific file.

I decided the lock on a file is the solution I needed. So instead of a sem_wait() and sem_post() call, I'm using:

lockf( fd, F_LOCK, 0 ) 

and

lockf( fd, F_ULOCK, 0 ) 

When the application exits in any way, the file is automatically closed which also releases the file lock. Other client apps waiting for the "semaphore" are then free to proceed as expected.

Thanks for the help, guys.

like image 189
Stéphane Avatar answered Oct 29 '22 16:10

Stéphane