I need to sort a 20 GB file ( which consists of random numbers) in the ascending order, But I am not understanding what technique should I use. I tried to use ArrayList in my Java Program, but it runs out of Memory. Increasing the heap size didn't work too, I guess 20 GB is too big. Can anybody guide me, how should I proceed ?
You shall use an external sorting algorithm, do not try to fit this in memory.
http://en.wikipedia.org/wiki/External_sorting
If you think it is too complex, try this:
H2 database is simple, works very well with Java and can be embedded in your JAR (does not need any kind of installation or setup).
You don't need any special tools for this really. This is a textbook case for external merge sort, wherein you read in parts of the large file at a time (say 100M), sort them, and write the sorted results to an external file. Read in another part, sort it, spit it back out, until there's nothing left to sort. Then you need to read in the sorted chunks, a smaller piece at a time (say 10M) and sort those in memory. The tricky point is to merge those sorted bits together in the right way. Read the external sorting page on Wikipedia as well, as already mentioned. Also, here's an implementation in Java that does this kind of external merge sorting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With