Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of threads in c++11

I am interested in the performance of mutex and message passing in the latest gcc with threads based on pthreads and a Ubuntu development environment. A good generic problem for this is the dining philosophers where each philosopher uses lh and rh fork shared with left and right hand neighbour. I increase the number of philosophers to 99 to keep my quad core processor busy.

    int result = try_lock(forks[lhf], forks[rhf]);

the above code allows my philosopher to attempt to grab the two forks they need to eat with.

    // if the forks are locked then start eating
    if (result == -1)
    {
        state[j] = philosophers::State::Eating;
        eating[j]++;
        if (longestWait < waiting[j])
        {
            longestWait = waiting[j];
        }
        waiting[j] = 0;
    } else {
        state[j] = philosophers::State::Thinking;
        thinking[j]++;
        waiting[j]++;
    }

the above code monitors my philosophers progress eating or thinking depending if they manage to reserve the two forks.

    {
        testEnd te(eating[j]+thinking[j]-1);
        unique_lock<mutex> lk(cycleDone);
        endCycle.wait(lk, te);
    }

the above code waits for all the philosophers to complete the selection after this time the philosopher is free to make a new attempt:

    if ( philosophers::State::Eating == state[j] )
    {
        state[j] = philosophers::State::Thinking;
        forks[lhf].unlock();
        forks[rhf].unlock();
    }

I have a main thread that monitors the philosophers and moves them from one cycle to the next that allows them about 10 seconds to eat and think as much as they can. The result is about 9540 cycles with some philosophers starving and other having plenty to eat and lots of thinking time! So I need to protect my philosophers from starvation and waiting too long so I add more logic to prevent over eating by requiring the eating philosophers to release and think rather than gab the same forks after a very small break:

    // protect the philosopher against starvation
    if (State::Thinking == previous)
    {
        result = try_lock(forks[lhf], forks[rhf]);
    }

Now I have 9598 cycles with every philosopher getting a relatively equal share of eating (2620 - 2681) and thinking with the longest wait of 14. Not bad. But I am not satisfied so now I get rid of all the mutex's and locks and keep it simple with the even philosophers eating in even cycles and the odd philosophers eating in odd cycles. I use a simple method of syncing the philosophers

while (counter < counters[j])
{
    this_thread::yield();
}

Prevents a philosopher from eating or thinking too many times using a global cycle counter. Same time period and the philosophers manage about 73543 cycles with 36400 eating and no more than 3 cycles waiting. So my simple algorithm with no locks is both faster and has a better distribution of processing between the various threads.

Can anyone think of a better way to solve this problem? I fear that when I implement a complex system with multiple threads that if I follow traditional mutex and message passing techniques I will end up with slower than necessary and possible unbalanced processing on the various threads in my system.

like image 793
Pete Avatar asked Jun 09 '13 14:06

Pete


People also ask

What is thread performance?

single thread performance is the amount of work completed by some software that runs as a single stream of instructions in a certain amount of time.

How do threads affect performance?

Each software thread requires virtual memory for its stack and private data structures. As with caches, time slicing causes threads to fight each other for real memory and thus hurts performance. In extreme cases, there can be so many threads that the program runs out of even virtual memory.

What is multithreading C ++ 11?

Multithreading in C++ C++ 11 did away with all that and gave us std::thread. The thread classes and related functions are defined in the thread header file. std::thread is the thread class that represents a single thread in C++.

What is thread performance CPU?

A thread is a virtual version of a CPU core. To create a thread, Intel CPUs uses hyper-threading, and AMD CPUs uses simultaneous multithreading, or SMT for short (they're the same thing). These are both names for the process of breaking up physical cores into virtual cores (threads) to increase performance.


1 Answers

This is an interesting way to explore the issues of threading in c++.

To address specific points:

I fear that when I implement a complex system with multiple threads that if I follow traditional mutex and message passing techniques I will end up with slower than necessary and possible unbalanced processing on the various threads in my system.

Unfortunately, the best answer I can give you is that this is a well founded fear. The cost of scheduling and synchronization is very specific to the application, though -- this becomes an engineering decision when designing a large system. First and foremost, scheduling is NP-Hard (http://en.wikipedia.org/wiki/Multiprocessor_scheduling) but has good approximations.

As far as your particular example, I think it is difficult to draw general conclusions based on the results you present -- there is one primary take home point: the trade off between coarse grain synchronization and fine grain synchronization. This is well studied problem and some research may be helpful (eg http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=744377&tag=1).

Overall, this touches on a engineering issue which is going to be specific to the problem you want to solve, the operating system and the hardware.

like image 151
hazydev Avatar answered Oct 03 '22 10:10

hazydev