Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory limitted to about 2.5 GB for single .net process

I am writing .NET applications running on Windows Server 2016 that does an http get on a bunch of pieces of a large file. This dramatically speeds up the download process since you can download them in parallel. Unfortunately, once they are downloaded, it takes a fairly long time to pieces them all back together.

There are between 2-4k files that need to be combined. The server this will run on has PLENTLY of memory, close to 800GB. I thought it would make sense to use MemoryStreams to store the downloaded pieces until they can be sequentially written to disk, BUT I am only able to consume about 2.5GB of memory before I get an System.OutOfMemoryException error. The server has hundreds of GB available, and I can't figure out how to use them.

like image 430
Josh Dayberry Avatar asked Nov 07 '18 15:11

Josh Dayberry


3 Answers

MemoryStreams are built around byte arrays. Arrays cannot be larger than 2GB currently.

The current implementation of System.Array uses Int32 for all its internal counters etc, so the theoretical maximum number of elements is Int32.MaxValue.

There's also a 2GB max-size-per-object limit imposed by the Microsoft CLR.

As you try to put the content in a single MemoryStream the underlying array gets too large, hence the exception.

Try to store the pieces separately, and write them directly to the FileStream (or whatever you use) when ready, without first trying to concatenate them all into 1 object.

like image 106
Marcell Toth Avatar answered Nov 15 '22 11:11

Marcell Toth


According to the source code of the MemoryStream class you will not be able to store more than 2 GB of data into one instance of this class. The reason for this is that the maximum length of the stream is set to Int32.MaxValue and the maximum index of an array is set to 0x0x7FFFFFC7 which is 2.147.783.591 decimal (= 2 GB).

Snippet MemoryStream

private const int MemStreamMaxLength = Int32.MaxValue;

Snippet array

// We impose limits on maximum array lenght in each dimension to allow efficient 
// implementation of advanced range check elimination in future.
// Keep in sync with vm\gcscan.cpp and HashHelpers.MaxPrimeArrayLength.
// The constants are defined in this method: inline SIZE_T MaxArrayLength(SIZE_T componentSize) from gcscan
// We have different max sizes for arrays with elements of size 1 for backwards compatibility
internal const int MaxArrayLength = 0X7FEFFFFF;
internal const int MaxByteArrayLength = 0x7FFFFFC7;

The question More than 2GB of managed memory has already been discussed long time ago on the microsoft forum and has a reference to a blog article about BigArray, getting around the 2GB array size limit there.

Update

I suggest to use the following code which should be able to allocate more than 4 GB on a x64 build but will fail < 4 GB on a x86 build

private static void Main(string[] args)
{
    List<byte[]> data = new List<byte[]>();
    Random random = new Random();

    while (true)
    {
        try
        {
            var tmpArray = new byte[1024 * 1024];
            random.NextBytes(tmpArray);
            data.Add(tmpArray);
            Console.WriteLine($"{data.Count} MB allocated");
        }
        catch
        {
            Console.WriteLine("Further allocation failed.");
        }
    }
}
like image 21
Markus Safar Avatar answered Nov 15 '22 11:11

Markus Safar


As has already been pointed out, the main problem here is the nature of MemoryStream being backed by a byte[], which has fixed upper size.

The option of using an alternative Stream implementation has been noted. Another alternative is to look into "pipelines", the new IO API. A "pipeline" is based around discontiguous memory, which means it isn't required to use a single contiguous buffer; the pipelines library will allocate multiple slabs as needed, which your code can process. I have written extensively on this topic; part 1 is here. Part 3 probably has the most code focus.

like image 26
Marc Gravell Avatar answered Nov 15 '22 13:11

Marc Gravell