Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ Sorting huge binary files

I need to sort huge binary files that won't fit into memory. It's no option to use a sort algorithm and continuously read/write from I/O device. Is there any possibility to use something like a memory mapped file?

like image 806
0xbadf00d Avatar asked Jun 18 '11 18:06

0xbadf00d


3 Answers

This is a solved problem, as explained on this wiki page: http://en.wikipedia.org/wiki/External_sorting

Basically, read in some set amount, sort it, save into a file, and repeat. Then, read in a smaller amount from each file, sort these, and continue until done.

UPDATE:

You may want to look at the java code he uses, it sounds like he solved what you need.

http://www.codeodor.com/index.cfm/2007/5/10/Sorting-really-BIG-files/1194

like image 157
James Black Avatar answered Nov 07 '22 06:11

James Black


One strategy is to sort chunks of it with quick sort or some other fast memory sort algorithm and then do a merge sort of these chunks.

like image 34
Axel Gneiting Avatar answered Nov 07 '22 07:11

Axel Gneiting


Here a nice solution with C++11:

https://github.com/alveko/external_sort

And some other options:

  • https://github.com/arq5x/kway-mergesort
  • https://stxxl.org/
  • https://github.com/owaisnm/external-sort
  • https://gist.github.com/manangandhi7/98d2cdd12c7800aaea3ecdc41a7755e3
like image 30
ferdymercury Avatar answered Nov 07 '22 08:11

ferdymercury