I am currently designing a C++ cross-platform (Linux/Windows) server application with a large-scale synchronization pattern. I internally use boost::thread as an abstraction of the OS-specific threads. My problem is to protect an array of data, each element of the array being protected by an independent reader/writer lock.
My array contains 4096 elements. Considering the solution of the "writer-priority readers-writers" problem that is presented in the "Little Book of Semaphores" (page 85), my application would need 5 semaphores per array element. This gives a total of about 20000 semaphores (or, equivalently, 20000 mutexes + 20000 condition variables).
An additional specificity of my application is that a given time, most semaphores are not active (there is typically about 32 "client" threads waiting/signaling on the thousands of semaphores). Note that since the entire server runs in a single process, I use lightweight, thread-based semaphores (not interprocess semaphores).
My question is twofold:
Is it recommended to create a total of 20000 semaphores on Linux and on Windows for a single process? Well, of course, I guess this is not the case...
If this practice is not recommended, what technique could I use to reduce the number of actual semaphores, e.g. to create a set of N "emulated semaphores" on the top of 1 actual semaphore? I suppose that this would be an interesting solution, because most of my semaphores are inactive at a given time.
Thanks in advance!
Digging into Boost source code, I have found that:
The reasons for this do not seem clear to me. In particular, the use of interprocess objects for "boost::shared_mutex" under Windows seems sub-optimal to me.
This is not recommended. You should not do this actually because in Windows it would consume 1 Handle Object per Semaphore. A process can only manage a specific amount of Handles objects. Thread/Process and other Windows objects may need to use Handle objects and will get crashed if they can't. This is similar in Linux with the file-descriptor concept.
Split your 4096 elements into 30 (for example) sets of 140 elements and assign to each 140-group a single Semaphore. Then 30 (in this example) threads will try to access to those 30 sets and they will get sinchronized based on each 140-group-Semaphore.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With