I'm doing a study on parallel programming and testing it on Sorting algorithms.
The easiest way I found to do it is using OpenMP, as it offer a simple way to implement threads.
I did a research and found that other people already done it, and then I tried some codes. But, when I test it with perf stat -r 10 -d
on Linux I'm getting a worse time than the serialized code (in some cases it is double the time).
I tried with a different number of elements on the array, the maximum I used was 1.000.000 numbers, as if I use more I get a error.
void merge(int aux[], int left, int middle, int right){
int temp[middle-left+1], temp2[right-middle];
for(int i=0; i<(middle-left+1); i++){
temp[i]=aux[left+i];
}
for(int i=0; i<(right-middle); i++){
temp2[i]=aux[middle+1+i];
}
int i=0, j=0, k=left;
while(i<(middle-left+1) && j<(right-middle))
{
if(temp[i]<temp2[j]){
aux[k++]=temp[i++];
}
else{
aux[k++]=temp2[j++];
}
}
while(i<(middle-left+1)){
aux[k++]=temp[i++];
}
while(j<(right-middle)){
aux[k++]=temp2[j++];
}
}
void mergeSort (int aux[], int left, int right){
if (left < right){
int middle = (left + right)/2;
omp_set_num_threads(2);
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
mergeSort(aux,left,middle); //call 1
#pragma omp section
mergeSort(aux,middle+1,right); //call 2
}
}
merge(aux,left,middle,right);
}
}
int main(){
generate_list(Vet, n);
mergeSort(Vet, 0, n-1);
return(0);
}
Below are the results i'm receiving:
OpenMP code:
Performance counter stats for ./mergeomp
(10 runs):
12,489169 task-clock (msec) # 0,717 CPUs utilized ( +- 3,58% )
8 context-switches # 0,681 K/sec ( +- 6,62% )
0 cpu-migrations # 0,000 K/sec
167 page-faults # 0,013 M/sec ( +- 0,79% )
<not supported> cycles
<not supported> instructions
<not supported> branches
<not supported> branch-misses
<not supported> L1-dcache-loads
<not supported> L1-dcache-load-misses
<not supported> LLC-loads
<not supported> LLC-load-misses
0,01743 +- 0,00211 seconds time elapsed ( +- 12,10% )
Serialized way(simple code):
Performance counter stats for ./mergesort
(10 runs):
3,757053 task-clock (msec) # 0,449 CPUs utilized ( +- 0,72% )
1 context-switches # 0,293 K/sec ( +- 16,32% )
0 cpu-migrations # 0,000 K/sec
139 page-faults # 0,037 M/sec ( +- 0,34% )
<not supported> cycles
<not supported> instructions
<not supported> branches
<not supported> branch-misses
<not supported> L1-dcache-loads
<not supported> L1-dcache-load-misses
<not supported> LLC-loads
<not supported> LLC-load-misses
0,008375 +- 0,000276 seconds time elapsed ( +- 3,29% )
Am I doing anything wrong? I'm compiling it with the -fopenmp
flag, but don't know if merge sort is not good to be parallelized, or if my linux virtual machine (I'm running Ubuntu on a VM Virtual Box machine, my PC have a Core I7 processor) is not well configured.
Thanks for everyone I resolved the issue.
First of all I had not setted multicores on my Virtual Machine.
Then, I changed the sections
construct for task
.
I also used a bigger number of elements on my array (2 Million).
And finally I added a filter to stop using parallelism when the array is smaller then "n" elements:
void mergeSortSerial(int aux[], int left, int right){
if (left < right){
int middle = (left + right)/2;
mergeSortSerial(aux,left,middle); //call 1
mergeSortSerial(aux,middle+1,right); //call 2
merge(aux,left,middle,right);
}
}
void mergeSort (int aux[], int left, int right){
if (left < right){
if ((right-left) > 1000){
int middle = (left + right)/2;
#pragma omp task firstprivate (aux, left, middle)
mergeSort(aux,left,middle); //call 1
#pragma omp task firstprivate (aux, middle, right)
mergeSort(aux,middle+1,right); //call 2
#pragma omp taskwait
merge(aux,left,middle,right);
} else{mergeSortSerial(aux, left, right);}
}
}
I found out that 1.000.000 is the best number for "n", my algorithm is 2 times faster then the sequential.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With