sorting 50 000 000 numbers

I can't use standard algorithm

Therefore i ask you about methods and algorithms :)

Ok.. I read about parallel mergesort... But it's not clear for me.

solution, the first version

code is located here

808

asked Nov 27 '10 12:11

mr. Vachovsky

2 Answers

50 million is not particularly large. I would just read them into memory. Sort them and write them out. It should take just a few seconds. How fast do you need it be? How compilcated do you need it to be?

On my old labtop it took 28 seconds. If I had more processors, it might be a little faster but much of the time is spent reading and writing the file (15 seconds) which wouldn't be any faster.

One of the critical factors is the size of your cache. The comparison itself is very cheap provided the data is in cache. As the L3 cache is shared, one thread is all you need to make full use of it.

Click to copy

public static void main(String...args) throws IOException {
    generateFile();

    long start = System.currentTimeMillis();
    int[] nums = readFile("numbers.bin");
    Arrays.sort(nums);
    writeFile("numbers2.bin", nums);
    long time = System.currentTimeMillis() - start;
    System.out.println("Took "+time+" secs to sort "+nums.length+" numbers.");
}

private static void generateFile() throws IOException {
    Random rand = new Random();
    int[] ints = new int[50*1000*1000];
    for(int i= 0;i<ints.length;i++)
        ints[i] = rand.nextInt();
    writeFile("numbers.bin", ints);
}

private static int[] readFile(String filename) throws IOException {
    DataInputStream dis = new DataInputStream(new BufferedInputStream(new FileInputStream(filename), 64*1024));
    int len = dis.readInt();
    int[] ints = new int[len];
    for(int i=0;i<len;i++)
        ints[i] = dis.readInt();
    return ints;
}

private static void writeFile(String name, int[] numbers) throws IOException {
    DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(name), 64*1024));
    dos.writeInt(numbers.length);
    for (int number : numbers)
        dos.writeInt(number);
    dos.close();
}

answered Oct 05 '22 22:10

Peter Lawrey

From top of my head, merge sort seems to be the best option when it comes to parallelisation and distribution, as it uses divide-and-conquer approach. For more information, google for "parallel merge sort" and "distributed merge sort".

For single-machine, multiple cores example, see see Correctly multithreaded quicksort or mergesort algo in Java?. If you can use Java 7 fork/join then see: "Java 7: more concurrency" and "Parallelism with Fork/Join in Java 7".

For distributing it over many machines, see Hadoop, it has a distributed merge sort implementation: see MergeSort and MergeSorter. Also of interest: Hadoop Sorts a Petabyte in 16.25 Hours and a Terabyte in 62 Seconds

answered Oct 05 '22 22:10

Neeme Praks

Related questions
                            
                                Java auto increment id [closed]
                            
                                Getting number of calls to a mock
                            
                                How to put black transparent on image in android
                            
                                Java extendable enumeration
                            
                                problem with Random.nextGaussian()
                            
                                What is the preferred way to write boolean expressions in Java
                            
                                why java does not support multiple inheritance [duplicate]
                            
                                The purpose of interfaces continued
                            
                                How to inflate a layout dynamically?
                            
                                Hibernate OneToOne lazy loading and cascading
                            
                                Java, Using Iterator to search an ArrayList and delete matching objects
                            
                                Aligning JMenu on the right corner of JMenuBar in Java Swing
                            
                                With or Without Spring, is there any improvement in performance
                            
                                Opening Finder/Explorer using Java Swing
                            
                                How to write integer with two digits? [duplicate]
                            
                                What is the main difference between primitive type and wrapper class?
                            
                                The type HashMap is not generic; it cannot be parameterized with arguments <String, Integer>
                            
                                Java: check if a given date is within current month
                            
                                Java - Transparent JScrollPane
                            
                                split a java collection into sub collections based on a object property

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

sorting 50 000 000 numbers

Tags:

java

algorithm

sorting

parallel-processing