I have a 5gig text file that needs to be sorted in alphabetical order What is the best algorithm to use?
constraints:
Speed - As fast as possible
Memory - A Pc with 1 Gig Ram running windows XP
I routinely sort text files >2GB with the sort
linux command. Usually takes 15 - 30 seconds, depending on server load.
Just do it, it won't take as long as you think.
Update Since you're using Windows XP, you can get the sort command in UnxUtils. I use that one probably more than the linux version, and it's equally as fast.
The bottleneck for huge files really disk speed .. my server above has a fast sata raid. If your machine is a desktop (or laptop), then your 7200 RPM (or 5400) RPM IDE drives will add a few minutes to the job.
For text files, sort
, at least the GNU Coreutils version in Linux and others, works surprisingly fast.
Take a look at the --buffer-size
and related options, and set --temporary-directory
if your /tmp
directory is too small.
Alternatively, if you're really worried how long it might take, you can split up the file into smaller chunks, sort then individually, then merge them together (with sort --merge
). Sorting each chunk can be done on different systems in parallel.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With