Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Boost interprocess mutexes and checking for abandonment

Tags:

c++

windows

boost

I have a need for interprocess synchronization around a piece of hardware. Because this code will need to work on Windows and Linux, I'm wrapping with Boost Interprocess mutexes. Everything works well accept my method for checking abandonment of the mutex. There is the potential that this can happen and so I must prepare for it.

I've abandoned the mutex in my testing and, sure enough, when I use scoped_lock to lock the mutex, the process blocks indefinitely. I figured the way around this is by using the timeout mechanism on scoped_lock (since much time spent Googling for methods to account for this don't really show much, boost doesn't do much around this because of portability reasons).

Without further ado, here's what I have:

#include <boost/interprocess/sync/named_recursive_mutex.hpp>
#include <boost/interprocess/sync/scoped_lock.hpp>

typedef boost::interprocess::named_recursive_mutex MyMutex;
typedef boost::interprocess::scoped_lock<MyMutex> ScopedLock;

MyMutex* pGate = new MyMutex(boost::interprocess::open_or_create, "MutexName");

{
    // ScopedLock lock(*pGate); // this blocks indefinitely
    boost::posix_time::ptime timeout(boost::posix_time::microsec_clock::local_time() + boost::posix_time::seconds(10));
    ScopedLock lock(*pGate, timeout); // a 10 second timeout that returns immediately if the mutex is abandoned ?????
    if(!lock.owns()) {
        delete pGate;
        boost::interprocess::named_recursive_mutex::remove("MutexName");
        pGate = new MyMutex(boost::interprocess::open_or_create, "MutexName");
    }
}

That, at least, is the idea. Three interesting points:

  • When I don't use the timeout object, and the mutex is abandoned, the ScopedLock ctor blocks indefinitely. That's expected.
  • When I do use the timeout, and the mutex is abandoned, the ScopedLock ctor returns immediately and tells me that it doesn't own the mutex. Ok, perhaps that's normal, but why isn't it waiting for the 10 seconds I'm telling it too?
  • When the mutex isn't abandoned, and I use the timeout, the ScopedLock ctor still returns immediately, telling me that it couldn't lock, or take ownership, of the mutex and I go through the motions of removing the mutex and remaking it. This is not at all what I want.

So, what am I missing on using these objects? Perhaps it's staring me in the face, but I can't see it and so I'm asking for help.

I should also mention that, because of how this hardware works, if the process cannot gain ownership of the mutex within 10 seconds, the mutex is abandoned. In fact, I could probably wait as little as 50 or 60 milliseconds, but 10 seconds is a nice "round" number of generosity.

I'm compiling on Windows 7 using Visual Studio 2010.

Thanks, Andy

like image 441
Andrew Falanga Avatar asked Apr 02 '13 19:04

Andrew Falanga


2 Answers

When I don't use the timeout object, and the mutex is abandoned, the ScopedLock ctor blocks indefinitely. That's expected

The best solution for your problem would be if boost had support for robust mutexes. However Boost currently does not support robust mutexes. There is only a plan to emulate robust mutexes, because only linux has native support on that. The emulation is still just planned by Ion Gaztanaga, the library author. Check this link about a possible hacking of rubust mutexes into the boost libs: http://boost.2283326.n4.nabble.com/boost-interprocess-gt-1-45-robust-mutexes-td3416151.html

Meanwhile you might try to use atomic variables in a shared segment.

Also take a look at this stackoverflow entry: How do I take ownership of an abandoned boost::interprocess::interprocess_mutex?

When I do use the timeout, and the mutex is abandoned, the ScopedLock ctor returns immediately and tells me that it doesn't own the mutex. Ok, perhaps that's normal, but why isn't it waiting for the 10 seconds I'm telling it too?

This is very strange, you should not get this behavior. However: The timed lock is possibly implemented in terms of the try lock. Check this documentation: http://www.boost.org/doc/libs/1_53_0/doc/html/boost/interprocess/scoped_lock.html#idp57421760-bb This means, the implementation of the timed lock might throw an exception internally and then returns false.

inline bool windows_mutex::timed_lock(const boost::posix_time::ptime &abs_time)
{
   sync_handles &handles =
      windows_intermodule_singleton<sync_handles>::get();
   //This can throw
   winapi_mutex_functions mut(handles.obtain_mutex(this->id_));
   return mut.timed_lock(abs_time);
}

Possibly, the handle cannot be obtained, because the mutex is abandoned.

When the mutex isn't abandoned, and I use the timeout, the ScopedLock ctor still returns immediately, telling me that it couldn't lock, or take ownership, of the mutex and I go through the motions of removing the mutex and remaking it. This is not at all what I want.

I am not sure about this one, but I think the named mutex is implemented by using a shared memory. If you are using Linux, check for the file /dev/shm/MutexName. In Linux, a file descriptor remains valid until that is not closed, no matter if you have removed the file itself by e.g. boost::interprocess::named_recursive_mutex::remove.

like image 155
Gabor Marton Avatar answered Sep 30 '22 15:09

Gabor Marton


Check out the BOOST_INTERPROCESS_ENABLE_TIMEOUT_WHEN_LOCKING and BOOST_INTERPROCESS_TIMEOUT_WHEN_LOCKING_DURATION_MS compile flags. Define the first symbol in your code to force the interprocess mutexes to time out and the second symbol to define the timeout duration.

I helped to get them added to the library to solve the abandoned mutex issue. It was necessary to add it due to many interprocess constructs (like message_queue) that rely on the simple mutex rather than the timed mutex. There may be a more robust solution in the future, but this solution has worked just fine for my interprocess needs.

I'm sorry I can't help you with your code at the moment; something is not working correctly there.

like image 42
user2005187 Avatar answered Sep 30 '22 15:09

user2005187