Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How a big array allocates memory?

I am looking for a way to hold big 3d sparse array structure into memory without waste a lot of memory. Here I've done an experiment with arrays of longs:

using System;
using System.Diagnostics;
using System.Runtime;

namespace ConsoleApp4
{
    public class Program
    {
        static Process proc = Process.GetCurrentProcess();
        const int MB = 1024 * 1024;
        const int IMAX = 5;
        const int JMAX = 100000000;
        public static void ShowTextWithMemAlloc(string text)
        {
            proc.Refresh();
            Console.WriteLine($"{text,-30}WS64:{proc.WorkingSet64/MB,5}MB  PMS64:{proc.PrivateMemorySize64/MB,5}MB");
            Console.ReadKey();
        }
        public static void Main(string[] args)
        {
            Console.Write(" ");
            ShowTextWithMemAlloc("Start.");
            long[] lArray = new long[IMAX * JMAX];
            long[] l1Array = new long[IMAX * JMAX];
            long[] l2Array = new long[IMAX * JMAX];
            long[] l3Array = new long[IMAX * JMAX];
            ShowTextWithMemAlloc("Arrays created.");
            lArray[IMAX * JMAX - 1] = 5000;
            l1Array[IMAX * JMAX - 1] = 5000;
            l2Array[IMAX * JMAX - 1] = 5000;
            l3Array[IMAX * JMAX - 1] = 5000;
            ShowTextWithMemAlloc("Last elements accessed.");
            for (var i=IMAX-1; i>= 0; i--)
            {
                for (var j=0; j<JMAX; j++)
                {
                    lArray[i * JMAX + j] = i * JMAX + j;
                }
                ShowTextWithMemAlloc($"Value for row {i} assigned.");
            }
            //lArray = new long[5];
            //l1Array = null;
            //l2Array = null;
            //l3Array = null;
            //GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
            //GC.Collect();
            //ShowTextWithMemAlloc($"GC.Collect done.");
            ShowTextWithMemAlloc("Stop.");
        }
    }
}

If you want to test it set the COMPlus_gcAllowVeryLargeObjects environment variable (Project Properties -> Debug) to 1 or change the JMAX. And this is the output:

 Start.                        WS64:   14MB  PMS64:    8MB
 Arrays created.               WS64:   15MB  PMS64:15360MB
 Last elements accessed.       WS64:   15MB  PMS64:15360MB
 Value for row 4 assigned.     WS64:  779MB  PMS64:15360MB
 Value for row 3 assigned.     WS64: 1542MB  PMS64:15360MB
 Value for row 2 assigned.     WS64: 2305MB  PMS64:15361MB
 Value for row 1 assigned.     WS64: 3069MB  PMS64:15361MB
 Value for row 0 assigned.     WS64: 3832MB  PMS64:15362MB
 Stop.                         WS64: 3844MB  PMS64:15325MB

When I see the memory consumption in the Task Manager is like this in Process.WorkingSet64. What is the real number? Why is memory allocated on assignment? Is an array actually a continuous allocated memory? Is an array an array? Do aliens exist? (dramatic background music)

Episode 2: We make a small change:

            //lArray[i * JMAX + j] = i * JMAX + j;
            var x= lArray[i * JMAX + j];

and nothing change (in the output). Where is the difference between existent and nonexistent? (more dramatic background music) Now we are waiting for answer from one of the mysterious people (They have some number and a small 'k' under their names).

Episode 3: Another change:

    //lArray[IMAX * JMAX - 1] = 5000;
    //l1Array[IMAX * JMAX - 1] = 5000;
    //l2Array[IMAX * JMAX - 1] = 5000;
    //l3Array[IMAX * JMAX - 1] = 5000;
    //ShowTextWithMemAlloc("Last elements accessed.");
    long newIMAX = IMAX-3;
    long newJMAX = JMAX / 10;
    for (var i=0; i<newIMAX; i++)
    {
        for (var j=0; j<newJMAX; j++)
        {
            lArray[i * newJMAX + j] = i * newJMAX + j;
            //var x= lArray[i * JMAX + j];
        }
        //ShowTextWithMemAlloc($"Value for row {i} assigned.");
    }
    ShowTextWithMemAlloc($"{newIMAX*newJMAX} values assigned.");

The output:

 Start.                             WS64:   14MB  PMS64:    8MB
 Arrays created.                    WS64:   15MB  PMS64:15369MB
 20000000 values assigned.          WS64:  168MB  PMS64:15369MB
 Stop.                              WS64:  168MB  PMS64:15369MB

PMS64 for one array (15369-8)/4 = 3840MB This is not sparse array, but partially filled array ;) .I am using full this 168MB.

Answer to some question "Why do you not use the exact size?". Because I don't know it? The data can come from several user defined SQLs. "Why do you not resize it?". Resize make a new array and copies the values. This is time to copy, memory and on the end the evil GC comes and eat you.

Did I waste memory. (I don't remember. The aliens?!) And when yes, how much? 0, (3840-168)MB or (15369-8-168)MB?

Epilogue:

Is a comment a comment or an answer?

is contiguous memory actually contiguous memory?

Do answers give answers? Mysterious. (more music)

(Scully: Mulder, toads just fell from the sky! Mulder: I guess their parachutes didn't open.)

Thank you all!

like image 203
Mottor Avatar asked Jun 02 '16 08:06

Mottor


People also ask

How does an array allocate memory?

Calls to malloc commonly use a sizeof expression to specify the size in bytes of the requested storage. To allocate storage for an array, just multiply the size of each array element by the array dimension. For example: pw = malloc(10 * sizeof(widget));

How does memory get allocated?

There are two basic types of memory allocation: When you declare a variable or an instance of a structure or class. The memory for that object is allocated by the operating system. The name you declare for the object can then be used to access that block of memory.

How is memory allocated to an array randomly or sequentially?

An array elements are always stored in sequential memory locations. Hence, the correct answer is option (A)

How is array allocated in memory in Java?

Memory is allocated in Heap are for the Array in Java. In Java reference types are stored in the Heap area. As arrays are also reference types, (they can be created using the “new” keyword) they are also stored in the Heap area.


1 Answers

The working set is not the amount of memory allocated. It's the set of pages that are currently available to the process. Windows implements various policies around that and the number generally is hard to interpret.

Here, the memory likely was requested as zeroed from the OS. The first access to a page actually makes a zeroed page available.

You should be looking at private bytes.

You can't sparsely allocate .NET arrays. Probably, you should look at employing some data structure that provides the impression of a sparse array.

Is an array actually a continuous allocated memory?

Yes, from the perspective of the CLR and the .NET code running. The OS might play tricks though such as lazily faulting in the pages on the first read or write.

For "Episode 2" the answer is that the faulting happens for reads as well as for writes. I don't quite follow what episode 3 does but I assume it just touches fewer pages.

Did I waste memory

This is more complicated to say. As long as the pages are not touched they are not physically in use. They can be used for the file cache for example or for other programs resident working set. They do count towards the commit charge of the system, though. Windows guarantees you that it can make those pages available to you. You will not run out of memory at some random memory access. Linux does not guarantee that. It has the OOM killer as a mitigation.

In an extreme case, if you allocate 1TB like that you need the sum of RAM and paging file size to exceed 1TB as well even though none of that space might end up being used.

Consider using memory mapped files. Here, the file is the backing store and RAM is treated like a cache. This would behave exactly the same way.

like image 187
usr Avatar answered Oct 14 '22 17:10

usr