Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quick sort with multithreading in c++

I implement a quicksort program with multi-thread method, in C++ with a Portfolio task.

The method of portfolio tasks is to maintain a queue of tasks. Each free thread picks a task from the portfolio, executes it, if necessary generating new subtasks and placing them in to the portfolio

But I'm not sure what is right! It seems to me, that in one thread, the algorithm works faster than two or four thread. Can I somehow messed with the synchronization?

Thanks for anybody to help me.

Code:

#include <thread>
#include <chrono>
#include <mutex>
#include <condition_variable>
#include <iostream>
#include <queue>
#include <vector>
#include <set>
#include <ctime>
#include <algorithm>

using namespace std;

//print array
template <typename T>
void print(const vector<T> &arr)
{
    for (size_t i = 0; i < arr.size(); i++)
        cout << arr[i] << " ";
    cout << endl;
}

//queue tasks
queue< pair<int, int> > tasks;
//mutexs for set and queue task
mutex q_mutex, s_mutex;
//condition variable
condition_variable cv;
//set
set<int> ss;

//partition algorithm
template <typename T>
int partition(vector<T> &arr, int l, int r)
{
    T tmp = arr[r]; //as pivot element
    int i = l - 1;

    for (int j = l; j <= r - 1; j++)
        if (arr[j] < tmp)
        {
            i++;
            swap(arr[i], arr[j]);       
        }

    swap(arr[i + 1], arr[r]);
    i++;
    return i;
}

//quick sort
template <typename T>
void quick_sort(vector<T> &arr)
{
    while (true)
    {
        unique_lock<mutex> u_lock(q_mutex); //lock mutex

        //sort is fineshed
        if ( ss.size() == arr.size() ) //u_lock.unlock()
            return;

        //if queue task is not empty 
        if ( tasks.size() > 0 )
        {
            //get task from queue
            pair<int, int> cur_task = tasks.front();            
            tasks.pop();

            int l = cur_task.first, r = cur_task.second;        

            if (l < r)
            {
                int q = partition(arr, l, r); //split array

                //Add indexes in set
                s_mutex.lock();
                ss.insert(q);
                ss.insert(l);
                ss.insert(r);
                s_mutex.unlock();

                //push new tasks for left and right part
                tasks.push( make_pair(l, q - 1) );
                tasks.push( make_pair(q + 1, r) );

                //wakeup some thread which waiting 
                cv.notify_one();
            }
        }
        else
            //if queue is empty
            cv.wait(u_lock);
    }
}

//Size array
const int ARR_SIZE = 100000;
//Count threads
const int THREAD_COUNT = 8;

thread thrs[THREAD_COUNT];

//generatin array
void generate_arr(vector<int> &arr)
{
    srand(time( NULL ));

    std::generate(arr.begin(), arr.end(), [](){return rand() % 10000; });
}

//check for sorting
bool is_sorted(const vector<int> &arr)
{
    for (size_t i = 0; i < arr.size() - 1; i++)
        if ( ! (arr[i] <= arr[i + 1]) ) 
            return false;
    return true;
}

int main()
{
    //time
    clock_t start, finish;

    vector<int> arr(ARR_SIZE);

    //generate array
    generate_arr(arr);

    cout << endl << "Generating finished!" << endl << endl;

    cout << "Array before sorting" << endl << endl;

    //Before sorting
    print(arr);

    cout << endl << endl;

    cout << "Checking is_sorted finished! The result is " << (is_sorted(arr) == 0? "false": "true") << "." << endl << endl;

    //add task
    tasks.push( make_pair(0, arr.size() - 1) );

    //==================================================
    start = clock();

    for (int i = 0; i < THREAD_COUNT; i++)
        thrs[i] = thread( quick_sort<int>, ref(arr) );

    finish = clock();
    //==================================================

    for (auto& th : thrs)
        th.join();

    cout << "Sorting finished!" << endl << endl;

    cout << "Array after sorting" << endl << endl;
    //After sorting
    print(arr);

    cout << endl << endl;

    cout << "Checking is_sorted finished! The result is " << (is_sorted(arr) == 0? "false": "true") << "." << endl << endl;

    cout << "Runtime: " << (double)(finish - start) / CLOCKS_PER_SEC << endl;

    return 0;
}
like image 787
rekrut Avatar asked Jul 01 '26 09:07

rekrut


1 Answers

There are a lot more factors to performance than just how many threads you throw at the problem. Among them,

  • You need to have actual concurrency, not just multiple threads. As @Rakete1111 and @user1034749 both observed, you don't.

  • Standard quicksort has good locality of reference, especially when partition sizes get small, but your technique throws a lot of that away because responsibility for a given array element is likely to be swapped to a different thread upon every partitioning.

  • Additionally, mutex operations are not particularly cheap, and you start doing quite a lot of those relative to the amount of actual sorting when the partitions get small.

  • It doesn't make sense to use more threads than you have physical cores. Four threads is probably not too many, but it depends on your hardware.

Here are some ways you might improve your multi-threaded performance:

  1. In method quick_sort(), do not hold mutex q_mutex locked during the actual sorting, as currently you do (the unique_lock constructor you are using locks the mutex, and you do not unlock it during the lifetime of the unique_lock).

  2. Switch to the ordinary recursive technique for partitions smaller than some threshold size. You'll have to test to find a good specific threshold value; perhaps it needs to be tunable.

  3. At each partitioning, have each thread post only one of the sub-partitions to the portfolio; let it handle the other recursively -- or better, iteratively. In fact, make it the smaller sub-partition that you post, as that will put a better bound on the size of the portfolio.

You might also consider increasing the number of elements on which you run your test. 100000 isn't really that many, and you might see different performance characteristics for larger problems. 1000000 elements is not at all unreasonable for such a test on modern hardware.

like image 174
John Bollinger Avatar answered Jul 03 '26 21:07

John Bollinger



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!