Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Thousands of reader/writer locks in a single process

I am currently designing a C++ cross-platform (Linux/Windows) server application with a large-scale synchronization pattern. I internally use boost::thread as an abstraction of the OS-specific threads. My problem is to protect an array of data, each element of the array being protected by an independent reader/writer lock.

My array contains 4096 elements. Considering the solution of the "writer-priority readers-writers" problem that is presented in the "Little Book of Semaphores" (page 85), my application would need 5 semaphores per array element. This gives a total of about 20000 semaphores (or, equivalently, 20000 mutexes + 20000 condition variables).

An additional specificity of my application is that a given time, most semaphores are not active (there is typically about 32 "client" threads waiting/signaling on the thousands of semaphores). Note that since the entire server runs in a single process, I use lightweight, thread-based semaphores (not interprocess semaphores).

My question is twofold:

  1. Is it recommended to create a total of 20000 semaphores on Linux and on Windows for a single process? Well, of course, I guess this is not the case...

  2. If this practice is not recommended, what technique could I use to reduce the number of actual semaphores, e.g. to create a set of N "emulated semaphores" on the top of 1 actual semaphore? I suppose that this would be an interesting solution, because most of my semaphores are inactive at a given time.

Thanks in advance!

Summary of the answers so far

  1. Using thousands of semaphores is not recommended, especially from a cross-platform perspective. And so, even if they are not interprocess semaphores (they still consume handles under Windows).
  2. A direct way to solve my problem is to split my array into e.g. 64 subarrays of 16 elements, and to associate each of these subarrays with a single read/write lock. Unfortunately, this introduces much contention (1 writer would block the reads to 15 elements).
  3. Digging into Boost source code, I have found that:

    • The implementation of "boost::mutex" does not wrap CRITICAL_SECTION objects under Windows (but CreateEvent and ReadWriteBarrier),
    • "boost::shared_mutex" uses CreateSemaphore under Windows (which are heavyweight, interprocess objects) , and
    • "boost::shared_mutex" does not wrap "pthread_rwlock_t" under Linux.

    The reasons for this do not seem clear to me. In particular, the use of interprocess objects for "boost::shared_mutex" under Windows seems sub-optimal to me.

Summary of the open questions so far

  1. How can I create a set of N "emulated semaphores" on the top of 1 actual semaphore, keeping the contention between the emulated semaphores as small as possible?
  2. How do "boost::mutex" and "boost::shared_mutex" compare with their native counterparts (CRITICAL_SECTION and pthread_rwlock_t)?
like image 990
Tisys Avatar asked Aug 06 '11 08:08

Tisys


1 Answers

  1. This is not recommended. You should not do this actually because in Windows it would consume 1 Handle Object per Semaphore. A process can only manage a specific amount of Handles objects. Thread/Process and other Windows objects may need to use Handle objects and will get crashed if they can't. This is similar in Linux with the file-descriptor concept.

  2. Split your 4096 elements into 30 (for example) sets of 140 elements and assign to each 140-group a single Semaphore. Then 30 (in this example) threads will try to access to those 30 sets and they will get sinchronized based on each 140-group-Semaphore.

like image 106
Damiox Avatar answered Oct 24 '22 16:10

Damiox