I'm currently using an instance of RandomAccessFile
to manage some in-memory data, but the size of my RandomAccessFile
instance is beyond 2^64 bytes, so I cannot used methods such as seek()
and write()
because they use Long
and cannot manage an address space bigger than 2^64. So what do I do ? Is there something else I can use which supports an address space beyond 2^64 ?
EDIT: Reason for asking this question:
I have a Tree data structure which in theory can have upto 2^128 nodes, and I want to store this tree onto a file. Each node has data that's roughly 6 bytes. So I'm wondering how will I store this tree to file.
Not a proper answer, but are you sure your file is actually this large?
From the docs for Long.MAX_VALUE:
A constant holding the maximum value a long can have, 2^63-1.
From the docs for RandomAccessFile.length():
the length of this file, measured in bytes.
Do you know how many bytes 2^63-1 is? Rather, 9,223,372,036,854,775,807 bytes?
9,223,372,036,854,775,807 B
9,223,372,036,854,775 KB
9,223,372,036,854 MB
9,223,372,036 GB
9,223,372 TB
9,223 PB
9 EB
If I math'd correctly, you would need a constant write speed of about 272GB/s for 1 year.
While this is an excellent question I would like to see an answer to, I highly doubt that you have a single file that will be 9EB in size, if the OS will even support this.
edit
Here are some File System Limits, and much to my own surprise, NTFS will actually support single files up to 16EiB, however that is only one of only a few on the list that do support it.
If you ABSOLUTELY need to access a file larger then 9EiB, it looks like you might need to roll your own version of RandomAccessFile, using BigInteger where the other uses long. This could get you up to (2 ^ 32) ^ Integer.MAX_VALUE
bytes.
I suppose that your question borns from this requirement "Is there something else I can use which supports an address space beyond". In another word, you want to access to memory by address, and your address could be big.
Of course, you should not allocate 2^128 * 6 bytes file, even if it would be possible nowadays, it would be too expensive. The typical approach here is split your storage into smaller parts and address it accordingly. For instance
write(partition, address, node);
node = read(partition, address);
As you said, you should store IPv6 addresses. To store IPv6 and search fast over it is enough to have a table with 8 columns and indexes for each part of an ipv6 address. Or you can store information in tree hierarchy like:
Which you should allocate on demand. So the real question should be how to organize your storage effectively.
UPDATE
I want to note that in reality there is private API in Java (Oracle JDK, not OpenJDK), which can give you an opportunity to handle files more than 2 Gb, but it is private, is not a part of public API at all, so I wouldn't describe it here, without requests. You could find it directly in sun.nio.ch.FileChannelImpl (private map0, unmap0 methods).
Even if you had the software to do such things, it would be unusable at the scale you suggest since there doesn't exist a single machine with that much disk space.
So, since the main issue is the hardware limitations of a single machine, the solution would be to use a distributed computing framework that will allow you to scale out as much as needed. I suggest using https://ignite.apache.org/ as its incredibly flexible and has a pretty decent support here on stack overflow.
Coming at this from another perspective, you want to store IPv6 ip addresses. At the theoretical level, sure you will need 2^64 addresses. At the practical level, even if you attempted to index every IP out there today, you wouldn't significantly pass 2^32 since that is the number of IPv4s addresses and we are just passing that limit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With