Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RandomAccessFile with support beyond Long?

I'm currently using an instance of RandomAccessFile to manage some in-memory data, but the size of my RandomAccessFile instance is beyond 2^64 bytes, so I cannot used methods such as seek() and write() because they use Long and cannot manage an address space bigger than 2^64. So what do I do ? Is there something else I can use which supports an address space beyond 2^64 ?

EDIT: Reason for asking this question:

I have a Tree data structure which in theory can have upto 2^128 nodes, and I want to store this tree onto a file. Each node has data that's roughly 6 bytes. So I'm wondering how will I store this tree to file.

like image 361
Ahmad Avatar asked Aug 02 '17 20:08

Ahmad


3 Answers

Not a proper answer, but are you sure your file is actually this large?

From the docs for Long.MAX_VALUE:

A constant holding the maximum value a long can have, 2^63-1.

From the docs for RandomAccessFile.length():

the length of this file, measured in bytes.

Do you know how many bytes 2^63-1 is? Rather, 9,223,372,036,854,775,807 bytes?

9,223,372,036,854,775,807 B
9,223,372,036,854,775    KB
9,223,372,036,854        MB
9,223,372,036            GB
9,223,372                TB
9,223                    PB
9                        EB

If I math'd correctly, you would need a constant write speed of about 272GB/s for 1 year.

While this is an excellent question I would like to see an answer to, I highly doubt that you have a single file that will be 9EB in size, if the OS will even support this.

edit

Here are some File System Limits, and much to my own surprise, NTFS will actually support single files up to 16EiB, however that is only one of only a few on the list that do support it.


If you ABSOLUTELY need to access a file larger then 9EiB, it looks like you might need to roll your own version of RandomAccessFile, using BigInteger where the other uses long. This could get you up to (2 ^ 32) ^ Integer.MAX_VALUE bytes.

like image 172
Matt Clark Avatar answered Oct 10 '22 17:10

Matt Clark


I suppose that your question borns from this requirement "Is there something else I can use which supports an address space beyond". In another word, you want to access to memory by address, and your address could be big.

Of course, you should not allocate 2^128 * 6 bytes file, even if it would be possible nowadays, it would be too expensive. The typical approach here is split your storage into smaller parts and address it accordingly. For instance

write(partition, address, node);
node = read(partition, address);

As you said, you should store IPv6 addresses. To store IPv6 and search fast over it is enough to have a table with 8 columns and indexes for each part of an ipv6 address. Or you can store information in tree hierarchy like:

  • 0000
    • 0000
      • 0000
        • etc
    • 0001
      • 0000
        • etc

Which you should allocate on demand. So the real question should be how to organize your storage effectively.

UPDATE

I want to note that in reality there is private API in Java (Oracle JDK, not OpenJDK), which can give you an opportunity to handle files more than 2 Gb, but it is private, is not a part of public API at all, so I wouldn't describe it here, without requests. You could find it directly in sun.nio.ch.FileChannelImpl (private map0, unmap0 methods).

like image 39
egorlitvinenko Avatar answered Oct 10 '22 16:10

egorlitvinenko


Even if you had the software to do such things, it would be unusable at the scale you suggest since there doesn't exist a single machine with that much disk space.

So, since the main issue is the hardware limitations of a single machine, the solution would be to use a distributed computing framework that will allow you to scale out as much as needed. I suggest using https://ignite.apache.org/ as its incredibly flexible and has a pretty decent support here on stack overflow.

Coming at this from another perspective, you want to store IPv6 ip addresses. At the theoretical level, sure you will need 2^64 addresses. At the practical level, even if you attempted to index every IP out there today, you wouldn't significantly pass 2^32 since that is the number of IPv4s addresses and we are just passing that limit.

like image 27
Carlos Bribiescas Avatar answered Oct 10 '22 16:10

Carlos Bribiescas