According to this website, the usage of <code>MPI::COMM_WORLD.Send(...)</code> is thread safe. However in my application I often (not always) run into deadlocks or get segmentation faults. Enclosing each call of <code>MPI::COMM_WORLD</code> methods with a <code>mutex.lock()</code> and <code>mutex.unlock()</code> consistently removes deadlocks as well as segfaults. This is how I create threads: <pre class="prettyprint"><code>const auto communicator = std::make_shared<Communicator>(); std::vector<std::future<size_t>> handles; for ( size_t i = 0; i < n; ++i ) { handles.push_back(std::async(std::launch::async, foo, communicator)); } for ( size_t i = 0; i < n; ++i ) { handles[i].get(); } </code></pre> <code>Communicator</code> is a class which has a <code>std::mutex</code> member and exclusively calls methods such as <code>MPI::COMM_WORLD.Send()</code> and <code>MPI::COMM_WORLD.Recv()</code>. I do not use any other methods of sending/receiving with MPI. <code>foo</code> takes a <code>const std::shared_ptr<Commmunicator> &</code> as argument. My question: Is the thread safety promised by MPI not compatible with threads created by <code>std::async</code>?

Thread-safety in MPI doesn't work out of the box. First, you have to ensure that your implementation actually supports multiple threads making MPI calls at once. With some MPI implementations, for example Open MPI, this requires the library to be configured with special options at build time. Then you have to tell MPI to initialise at the appropriate thread support level. Currently the MPI standard defines four levels of thread support: <ul> <li> <code>MPI_THREAD_SINGLE</code> - means that the user code is single threaded. This is the default level at which MPI is initialised if <code>MPI_Init()</code> is used;</li> <li> <code>MPI_THREAD_FUNNELED</code> - means that the user code is multithreaded, but only the main thread makes MPI calls. The main thread is the one which initialises the MPI library;</li> <li> <code>MPI_THREAD_SERIALIZED</code> - means that the user code is multithreaded, but calls to the MPI library are serialised;</li> <li> <code>MPI_THREAD_MULTIPLE</code> - means that the user code is multithreaded and all threads can make MPI calls at any time with no synchronisation whatsoever.</li> </ul> In order to initialise MPI with thread support, one has to use <code>MPI_Init_thread()</code> instead of <code>MPI_Init()</code>: <pre class="prettyprint"><code>int provided; MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided); if (provided < MPI_THREAD_MULTIPLE) { printf("ERROR: The MPI library does not have full thread support\n"); MPI_Abort(MPI_COMM_WORLD, 1); } </code></pre> Equivalent code with the obsoleted (and removed from MPI-3) C++ bindings: <pre class="prettyprint"><code>int provided = MPI::Init_thread(argc, argv, MPI::THREAD_MULTIPLE); if (provided < MPI::THREAD_MULTIPLE) { printf("ERROR: The MPI library does not have full thread support\n"); MPI::COMM_WORLD.Abort(1); } </code></pre> Thread support levels are ordered like this: <code>MPI_THREAD_SINGLE</code> < <code>MPI_THREAD_FUNNELED</code> < <code>MPI_THREAD_SERIALIZED</code> < <code>MPI_THREAD_MULTIPLE</code>, so any other provided level, different from <code>MPI_THREAD_MULTIPLE</code> would have lower numerical value - that's why the <code>if (...)</code> code above is written so. <code>MPI_Init(&argc, &argv)</code> is equivalent to <code>MPI_Init_thread(&argc, &argv, MPI_THREAD_SINGLE, &provided)</code>. Implementations are not required to initialise exactly at the requested level - rather they could initialise at any other level (higher or lower), which is returned in the <code>provided</code> output argument. For more information - see §12.4 of the MPI standard, freely available here. With most MPI implementations, the thread support at level <code>MPI_THREAD_SINGLE</code> is actually equivalent to that provided at level <code>MPI_THREAD_SERIALIZED</code> - exactly what you observe in your case. Since you've not specified which MPI implementation you use, here comes a handy list. I've already said that Open MPI has to be compiled with the proper flags enabled in order to support <code>MPI_THREAD_MULTIPLE</code>. But there is another catch - its InfiniBand component is not thread-safe and hence Open MPI would not use native InfiniBand communication when initialised at full thread support level. Intel MPI comes in two different flavours - one with and one without support for full multithreading. Multithreaded support is enabled by passing the <code>-mt_mpi</code> option to the MPI compiler wrapper which enables linking with the MT version. This option is also implied if OpenMP support or the autoparalleliser is enabled. I am not aware how the InfiniBand driver in IMPI works when full thread support is enabled. MPICH(2) does not support InfiniBand, hence it is thread-safe and probably most recent versions provide <code>MPI_THREAD_MULTIPLE</code> support out of the box. MVAPICH is the basis on which Intel MPI is built and it supports InfiniBand. I have no idea how it behaves at full thread support level when used on a machine with InfiniBand. The note about multithreaded InfiniBand support is important since lot of compute clusters nowadays use InfiniBand fabrics. With the IB component (<code>openib</code> BTL in Open MPI) disabled, most MPI implementations switch to another protocol, for example TCP/IP (<code>tcp</code> BTL in Open MPI), which results in much slower and more latent communication.

thread safety of MPI send using threads created with std::async

Tags:

According to this website, the usage of MPI::COMM_WORLD.Send(...) is thread safe. However in my application I often (not always) run into deadlocks or get segmentation faults. Enclosing each call of MPI::COMM_WORLD methods with a mutex.lock() and mutex.unlock() consistently removes deadlocks as well as segfaults.

This is how I create threads:

const auto communicator = std::make_shared<Communicator>();
std::vector<std::future<size_t>> handles;
for ( size_t i = 0; i < n; ++i )
{
   handles.push_back(std::async(std::launch::async, foo, communicator));
}
for ( size_t i = 0; i < n; ++i )
{
   handles[i].get();
}

Communicator is a class which has a std::mutex member and exclusively calls methods such as MPI::COMM_WORLD.Send() and MPI::COMM_WORLD.Recv(). I do not use any other methods of sending/receiving with MPI. foo takes a const std::shared_ptr<Commmunicator> & as argument.

My question: Is the thread safety promised by MPI not compatible with threads created by std::async?

713

asked Feb 12 '13 16:02

stefan

2 Answers

Thread-safety in MPI doesn't work out of the box. First, you have to ensure that your implementation actually supports multiple threads making MPI calls at once. With some MPI implementations, for example Open MPI, this requires the library to be configured with special options at build time. Then you have to tell MPI to initialise at the appropriate thread support level. Currently the MPI standard defines four levels of thread support:

MPI_THREAD_SINGLE - means that the user code is single threaded. This is the default level at which MPI is initialised if MPI_Init() is used;
MPI_THREAD_FUNNELED - means that the user code is multithreaded, but only the main thread makes MPI calls. The main thread is the one which initialises the MPI library;
MPI_THREAD_SERIALIZED - means that the user code is multithreaded, but calls to the MPI library are serialised;
MPI_THREAD_MULTIPLE - means that the user code is multithreaded and all threads can make MPI calls at any time with no synchronisation whatsoever.

In order to initialise MPI with thread support, one has to use MPI_Init_thread() instead of MPI_Init():

int provided;

MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
if (provided < MPI_THREAD_MULTIPLE)
{
    printf("ERROR: The MPI library does not have full thread support\n");
    MPI_Abort(MPI_COMM_WORLD, 1);
}

Equivalent code with the obsoleted (and removed from MPI-3) C++ bindings:

int provided = MPI::Init_thread(argc, argv, MPI::THREAD_MULTIPLE);
if (provided < MPI::THREAD_MULTIPLE)
{
    printf("ERROR: The MPI library does not have full thread support\n");
    MPI::COMM_WORLD.Abort(1);
}

Thread support levels are ordered like this: MPI_THREAD_SINGLE < MPI_THREAD_FUNNELED < MPI_THREAD_SERIALIZED < MPI_THREAD_MULTIPLE, so any other provided level, different from MPI_THREAD_MULTIPLE would have lower numerical value - that's why the if (...) code above is written so.

MPI_Init(&argc, &argv) is equivalent to MPI_Init_thread(&argc, &argv, MPI_THREAD_SINGLE, &provided). Implementations are not required to initialise exactly at the requested level - rather they could initialise at any other level (higher or lower), which is returned in the provided output argument.

For more information - see §12.4 of the MPI standard, freely available here.

With most MPI implementations, the thread support at level MPI_THREAD_SINGLE is actually equivalent to that provided at level MPI_THREAD_SERIALIZED - exactly what you observe in your case.

Since you've not specified which MPI implementation you use, here comes a handy list.

I've already said that Open MPI has to be compiled with the proper flags enabled in order to support MPI_THREAD_MULTIPLE. But there is another catch - its InfiniBand component is not thread-safe and hence Open MPI would not use native InfiniBand communication when initialised at full thread support level.

Intel MPI comes in two different flavours - one with and one without support for full multithreading. Multithreaded support is enabled by passing the -mt_mpi option to the MPI compiler wrapper which enables linking with the MT version. This option is also implied if OpenMP support or the autoparalleliser is enabled. I am not aware how the InfiniBand driver in IMPI works when full thread support is enabled.

MPICH(2) does not support InfiniBand, hence it is thread-safe and probably most recent versions provide MPI_THREAD_MULTIPLE support out of the box.

MVAPICH is the basis on which Intel MPI is built and it supports InfiniBand. I have no idea how it behaves at full thread support level when used on a machine with InfiniBand.

The note about multithreaded InfiniBand support is important since lot of compute clusters nowadays use InfiniBand fabrics. With the IB component (openib BTL in Open MPI) disabled, most MPI implementations switch to another protocol, for example TCP/IP (tcp BTL in Open MPI), which results in much slower and more latent communication.

183

answered Sep 20 '22 20:09

Hristo Iliev

There are four levels of MPI thread safety, not all of them supported by every implementation: MPI_THREAD_SINGLE, MPI_THREAD_FUNNELED, MPI_THREAD_SERIALIZED and MPI_THREAD_MULTIPLE. The last one, which allows for a process to have multiple threads which may simultaneously call MPI functions, is probably the one you are interested in. So, first of all, you need to make sure your implementation supports MPI_THREAD_SERIALIZED.

The required level of thread safety must be specified by a call to MPI_Init_thread. After you have called MPI_Init_thread you should be able to safely call MPI functions in boost (POSIX) threads created on your own.

answered Sep 16 '22 20:09

piokuc

Related questions
                            
                                Can the HTML 'class' element attribute contain line breaks?
                            
                                colored wireframe plot in matplotlib
                            
                                Green threads and Native threads in java [duplicate]
                            
                                Implement pause/resume in file downloading
                            
                                Extend length of border css
                            
                                Get idle time of machine
                            
                                error with serialization with protobuf
                            
                                Java: Comparing ints and Strings - Performance
                            
                                How to add an integer number and a float number in a bash shell script
                            
                                AWS Redshift JDBC insert performance
                            
                                Xcode doesn't seem to execute my scheme's post-action
                            
                                Difficulty understanding layout_alignWithParentIfMissing

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With