Performance of threads in c++11

Tags:

I am interested in the performance of mutex and message passing in the latest gcc with threads based on pthreads and a Ubuntu development environment. A good generic problem for this is the dining philosophers where each philosopher uses lh and rh fork shared with left and right hand neighbour. I increase the number of philosophers to 99 to keep my quad core processor busy.

    int result = try_lock(forks[lhf], forks[rhf]);

the above code allows my philosopher to attempt to grab the two forks they need to eat with.

    // if the forks are locked then start eating
    if (result == -1)
    {
        state[j] = philosophers::State::Eating;
        eating[j]++;
        if (longestWait < waiting[j])
        {
            longestWait = waiting[j];
        }
        waiting[j] = 0;
    } else {
        state[j] = philosophers::State::Thinking;
        thinking[j]++;
        waiting[j]++;
    }

the above code monitors my philosophers progress eating or thinking depending if they manage to reserve the two forks.

    {
        testEnd te(eating[j]+thinking[j]-1);
        unique_lock<mutex> lk(cycleDone);
        endCycle.wait(lk, te);
    }

the above code waits for all the philosophers to complete the selection after this time the philosopher is free to make a new attempt:

    if ( philosophers::State::Eating == state[j] )
    {
        state[j] = philosophers::State::Thinking;
        forks[lhf].unlock();
        forks[rhf].unlock();
    }

I have a main thread that monitors the philosophers and moves them from one cycle to the next that allows them about 10 seconds to eat and think as much as they can. The result is about 9540 cycles with some philosophers starving and other having plenty to eat and lots of thinking time! So I need to protect my philosophers from starvation and waiting too long so I add more logic to prevent over eating by requiring the eating philosophers to release and think rather than gab the same forks after a very small break:

    // protect the philosopher against starvation
    if (State::Thinking == previous)
    {
        result = try_lock(forks[lhf], forks[rhf]);
    }

Now I have 9598 cycles with every philosopher getting a relatively equal share of eating (2620 - 2681) and thinking with the longest wait of 14. Not bad. But I am not satisfied so now I get rid of all the mutex's and locks and keep it simple with the even philosophers eating in even cycles and the odd philosophers eating in odd cycles. I use a simple method of syncing the philosophers

while (counter < counters[j])
{
    this_thread::yield();
}

Prevents a philosopher from eating or thinking too many times using a global cycle counter. Same time period and the philosophers manage about 73543 cycles with 36400 eating and no more than 3 cycles waiting. So my simple algorithm with no locks is both faster and has a better distribution of processing between the various threads.

Can anyone think of a better way to solve this problem? I fear that when I implement a complex system with multiple threads that if I follow traditional mutex and message passing techniques I will end up with slower than necessary and possible unbalanced processing on the various threads in my system.

793

asked Jun 09 '13 14:06

Pete

1 Answers

This is an interesting way to explore the issues of threading in c++.

To address specific points:

I fear that when I implement a complex system with multiple threads that if I follow traditional mutex and message passing techniques I will end up with slower than necessary and possible unbalanced processing on the various threads in my system.

Unfortunately, the best answer I can give you is that this is a well founded fear. The cost of scheduling and synchronization is very specific to the application, though -- this becomes an engineering decision when designing a large system. First and foremost, scheduling is NP-Hard (http://en.wikipedia.org/wiki/Multiprocessor_scheduling) but has good approximations.

As far as your particular example, I think it is difficult to draw general conclusions based on the results you present -- there is one primary take home point: the trade off between coarse grain synchronization and fine grain synchronization. This is well studied problem and some research may be helpful (eg http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=744377&tag=1).

Overall, this touches on a engineering issue which is going to be specific to the problem you want to solve, the operating system and the hardware.

151

answered Oct 03 '22 10:10

hazydev

Related questions
                            
                                Defining a destructor prevents member functions being inlined
                            
                                How to minimize memory consumption of c++ program using copy-on-write?
                            
                                How to manage 3rd party libraries in a multi-configuration project
                            
                                How to up- and download data from an USB device using MTP (Device is not a camera)
                            
                                GNU Make producing a totally different result
                            
                                Are there code metrics that will cover variable scoping
                            
                                suitable name for a sequence which only grows at one end
                            
                                Threading opencl compiling
                            
                                Poor network performance with Websockets running on apple device
                            
                                How can I get a stacktrace from C++ in WinRT?
                            
                                Including boost function.hpp, without using it, increases the size of my binary by 200k. Why?
                            
                                Include a file within a folder in an Arduino library
                            
                                OpenCV: Comparing multiple images using ORB
                            
                                Stack overflow exception before main()
                            
                                Attribute divisor without instancing?
                            
                                Determine whether a COM is a In-Proc or LocalServer
                            
                                How is total stack size required by a function and variable scope related?
                            
                                A simple test case between clang++/g++/gfortran
                            
                                Validate contents of std::initializer_list at compile time
                            
                                CMake: What's the usage of "make depend"?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Performance of threads in c++11

Tags:

c++

c++11

concurrency

Pete

People also ask

1 Answers

hazydev

Recent Activity

Donate For Us