Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to manage huge trees on a standard PC?

  1. Given a root node, which should start producing a tree with about 1010 (ab. 234) nodes, is it appropriate to use a memory-mapped file which once will contain the whole tree?
  2. What operating-system-related problems may occur (file I/O, huge file support)?
  3. Have C, gcc and glibc some implicit limits (pointers)?
  4. Has Linux any issues/limits with large files?
like image 875
psihodelia Avatar asked Nov 05 '22 16:11

psihodelia


1 Answers

As yi_H mentioned in his comment, you'll want a 64 bit operating system and a file system that supports large files. Assuming each node contains on the order of 2^5=32 bytes of data, 2^40 nodes will result in 2^45 bytes = 32 terabytes. Now assuming you're not running on a modern military fighter plane, you'll need to map most of that data to the hard disk.

Once the data is on your disk and the file system is properly configured, I don't think there will be problems with any system limitations. However read/write speed will definitely an issue. Given an average IO speed of 100 mb/s on your hard drive, it would take about 4-5 days to just traverse the entire tree.

It would be better to divide the data up onto multiple computers and parallelize your operations.

like image 190
tskuzzy Avatar answered Nov 09 '22 09:11

tskuzzy