Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithms requiring 4 GB or 5 GB numbers - Is it possible?

Okay, this problem is indeed a challenge!

Background

I am working on an arithmetic-based project involving larger than normal numbers. I new I was going to be working with a worst-case-senario of 4 GB-caped file sizes (I was hopping to even extend that to a 5GB cap as I have seen file sizes greater than 4 GB before - specifically image *.iso files)

The Question In General

Now, the algorithm(s) to which I will apply computation to do not matter at the moment, but the loading and handling of such large quantities of data - the numbers - do.

  • A System.IO.File.ReadAllBytes(String) can only read a cap of 2 GB worth of file data, so this is my first problem - How will I go about loading and/or configuring to access memory, such file sizes - twice as much, if not more?
  • Next, I was writing my own class to treat a 'stream' or array of bytes as a big number and add multiple operator methods to perform hex arithmetic until I read about the System.Numerics.BigInteger() class online - but being that there is no BigInteger.MaxValue and that I can only load a max of 2 GB of data at a time, I don't know what the potential of BigInteger would be - even compared to the object I was writing called Number() (which does have my desired minimum potential). There were also issues with available memory and performance, though I do not care so much about speed, but rather completing this experimental process successfully.

Summary

  • How should I load 4-5 gigabytes of data?
  • How should I store and handle the data after having been loaded? Stick with BigInteger or finish my own Number class?
  • How should I handle such large quantities of memory during runtime without running out of memory? I'll be treating the 4-5 GB of data like any other number instead of an array of bytes - performing such arithmetic as division and multiplication.

PS I cannot reveal too much information about this project under a non-discloser agreement. ;)

For those who would like to see a sample operator from my Number object for a per-byte array adder(C#):

public static Number operator +(Number n1, Number n2)
{
    // GB5_ARRAY is a cap constant for 5 GB - 5368709120L
    byte[] data = new byte[GB5_ARRAY];
    byte rem = 0x00, bA, bB, rm, dt;
    // Iterate through all bytes until the second to last
    // The last byte is the remainder if any
    // I tested this algorithm on smaller arrays provided by the `BitConverter` class,
    // then I made a few tweeks to satisfy the larger arrays and the Number object
    for (long iDx = 0; iDx <= GB5_ARRAY-1; iDx++)
    {
        // bData is a byte[] with GB5_ARRAY number of bytes
        // Perform a check - solves for unequal (or jagged) arrays
        if (iDx < GB5_ARRAY - 1) { bA = n1.bData[iDx]; bB = n2.bData[iDx]; } else { bA = 0x00; bB = 0x00; }
        Add(bA, bB, rem, out dt, out rm);
        // set data and prepare for the next interval
        rem = rm; data[iDx] = dt;
    }
    return new Number(data);
}
private static void Add(byte a, byte b, byte r, out byte result, out byte remainder)
{
    int i = a + b + r;
    result = (byte)(i % 256); // find the byte amount through modulus arithmetic
    remainder = (byte)((i - result) / 256); // find remainder
}
like image 999
TekuConcept Avatar asked Jan 16 '23 14:01

TekuConcept


1 Answers

Normally, you would process large files using a streaming API, either raw binary (Stream), or via some protocol-reader (XmlReader, StreamReader, etc). This also could be done via memory-mapped files in some cases. The key point here is that you only look at a small portion of the file at a time (a moderately-sized buffer of data, a logical "line", or "node", etc - depending on the scenario).

Where this gets odd is your desire to map this somehow directly to some form of large number. Frankly, I don't know how we can help with that without more information, but if you are dealing with an actual number of this size, I think you're going to struggle unless the binary protocol makes that convenient. And "performing such arithmetic as division and multiplication" is meaningless on raw data; that only makes sense on parsed data with custom operations defined.

Also: note that in .NET 4.5 you can flip a configuration switch to expand the maximum size of arrays, going over the 2GB limit. It still has a limit, but: it is a bit bigger. Unfortunately, the maximum number of elements is still the same, so if you are using a byte[] array it won't help. But if you are using SomeCompositeStruct[] you should be able to get higher usage. See gcAllowVeryLargeObjects

like image 64
Marc Gravell Avatar answered Jan 28 '23 00:01

Marc Gravell