Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel sort slower than serial sort [duplicate]

Tags:

java

sorting

I was reading about the new features in Java 8 and one of them was the new Arrays.parallelSort() method. I made some tests sorting an array of doubles and one of Strings and for Strings the parallelSort was much slower.

Here is the content of a test method for Strings:

    final int size = 10000;
    final String[] values1 = new String[size];
    final String[] values2 = new String[size];
    for (int i = 0; i < size; i++) {
        values1[i] = Integer.toString(i);
        values2[i] = values1[i];
    }
    Collections.shuffle(Arrays.asList(values1));
    Collections.shuffle(Arrays.asList(values2));
    final Comparator<String> comparator = (o1, o2) -> o2.compareTo(o1);

    long startTimeInNano = System.nanoTime();
    Arrays.sort(values1, comparator);
    long endTimeInNano = System.nanoTime();
    System.out.println("Arrays.sort: totalTimeInMicro= " + ((endTimeInNano - startTimeInNano)/1000));

    //parallel sort with java 8
    startTimeInNano = System.nanoTime();
    Arrays.parallelSort(values2,comparator);
    endTimeInNano = System.nanoTime();
    System.out.println("Arrays.parallelSort: totalTimeInMicro= " + ((endTimeInNano - startTimeInNano)/1000));

The result was:

Arrays.sort: totalTimeInMicro= 11993

Arrays.parallelSort: totalTimeInMicro= 89823

I also tried this code on another computer and the result was the same (25608 vs 808660). The computer I run the tests has an i5-2500 CPU. Do you have any idea why I get this kind of results?

like image 452
Slimu Avatar asked Mar 24 '15 13:03

Slimu


People also ask

What is the difference between sort and parallel sort?

The parallelSort() is functionally different. Unlike sort(), which sorts data sequentially using a single thread, it uses a parallel sort-merge sorting algorithm. It breaks the array into sub-arrays that are themselves sorted and then merged. For executing parallel tasks it uses the ForkJoin pool.

Which method will sort huge amount of data in lesser time in Java?

Merge Sort uses recursion to solve the problem of sorting more efficiently than algorithms previously presented, and in particular it uses a divide and conquer approach.

How does parallel sorting work?

It compares two adjacent numbers and switches them, if the first number is greater than the second number to get an ascending order list. The opposite case applies for a descending order series. Odd-Even transposition sort operates in two phases − odd phase and even phase.

What is the use of parallel array sorting in Java?

It sorts the specified range of the specified array of objects into ascending order, according to the natural ordering of its elements. The range to be sorted extends from index fromIndex, inclusive, to index toIndex, exclusive.


1 Answers

This benchmark tells you hardly anything. The most important things for a microbenchmark are

  • Give the JIT a chance to optimize the code by running the tests multiple times
  • Use different input sizes
  • Print some of the results, to prevent the JIT from optimizing away the whole calls

There are some more points to consider - in fact, many more points. You should consult How do I write a correct micro-benchmark in Java? for further information.

For really "profound" information, you should use tools like Caliper or JMH. But even with little effort, one can create a microbenchmark that shows a rough indication of how the actual performance would be. So one of the simplest forms of a microbenchmark could look like this:

import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;

public class ParallelSortSpeedTest
{
    public static void main(String[] args)
    {
        for (int size=100000; size<=1000000; size+=100000)
        {
            final String[] values1 = new String[size];
            final String[] values2 = new String[size];
            for (int i = 0; i < size; i++) {
                values1[i] = Integer.toString(i);
                values2[i] = values1[i];
            }
            Collections.shuffle(Arrays.asList(values1));
            Collections.shuffle(Arrays.asList(values2));
            final Comparator<String> comparator = (o1, o2) -> o2.compareTo(o1);

            testSort(values1, comparator);
            testParallelSort(values2, comparator);
        }
    }

    private static void testSort(
        String array[], final Comparator<String> comparator)
    {
        long startTimeInNano = System.nanoTime();
        Arrays.sort(array, comparator);
        long endTimeInNano = System.nanoTime();
        System.out.println("Arrays.sort        : totalTimeInMicro= " + 
            ((endTimeInNano - startTimeInNano)/1000)+", first "+array[0]);
    }

    private static void testParallelSort(
        String array[], final Comparator<String> comparator)
    {
        long startTimeInNano = System.nanoTime();
        Arrays.parallelSort(array, comparator);
        long endTimeInNano = System.nanoTime();
        System.out.println("Arrays.parallelSort: totalTimeInMicro= " + 
            ((endTimeInNano - startTimeInNano)/1000)+", first "+array[0]);
    }

}

This is a reasonable option, considering the trade-off between the effort of getting a JMH benchmark up and running, and the reliability of the results. This test will print something like

...
Arrays.sort        : totalTimeInMicro= 530669, first 999999
Arrays.parallelSort: totalTimeInMicro= 158227, first 999999

Showing that the parallel sort should be faster, at least.

like image 111
Marco13 Avatar answered Sep 20 '22 17:09

Marco13