Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does C# dynamically allocate memory for a List<T>?

From LukeH's answer to what is the max limit of data into list<string> in c#?

The maximum number of elements that can be stored in the current implementation of List is, theoretically, Int32.MaxValue - just over 2 billion.

we see that a List can carry a large amount of items. I'm assuming that the compiler doesn't just free up the space for 2 billion times size of T for every new implementation of List<T>, so how does the list dynamically grow? Does it have pointers to noncontiguous spaces in memory?

like image 357
8protons Avatar asked Jul 07 '18 22:07

8protons


People also ask

How does the C language work?

C is what's referred to as a compiled language, meaning you have to use a compiler to turn the code into an executable file before you can run it. The code is written into one or more text files, which you can open, read and edit in any text editor, such as Notepad in Windows, TextEdit on a Mac, and gedit in Linux.

How code is executed in C?

Loader loads the executable module to the main memory for execution. Linker takes the object code generated by an assembler, as input. Loader takes executable module generated by a linker as input. Linker combines all the object modules of a source code to generate an executable module.

How is C language made?

The origin of C is closely tied to the development of the Unix operating system, originally implemented in assembly language on a PDP-7 by Dennis Ritchie and Ken Thompson, incorporating several ideas from colleagues. Eventually, they decided to port the operating system to a PDP-11.


1 Answers

The List<T> class is implemented to use an internal T[] array under the hood. If you initialize it using the List<T>(int) constructor, it will allocate an array of the specified size. If you use the default constructor, it will go for the default capacity of 4, but in this case, the array would only get allocated on the first addition.

Each time you add an element to the list, it will first check whether the capacity has been reached (i.e. whether the existing Count equals Capacity). If so, it will create a fresh array of twice the size as the previous one, copy over all existing elements into it, and then proceed with writing the new element. This will keep happening indefinitely on subsequent element additions, until the hard limit you referenced (Int32.MaxValue) is reached.

Performance-wise, this means that the addition of an element is either an O(1) or an O(n) operation, depending on whether the capacity needs to be increased (as discussed under Add). However, since the capacity is doubled whenever it needs to increase, this reallocation happens with exponentially decreasing frequency as the list grows larger. For example, starting from 4, the capacity increases would happen at 4, 8, 16, 32, 64, 128, … elements. Thus, the total cost of the reallocations when calling Add n times would be roughly 4 + 8 + 16 + … + n/8 + n/4 + n/2, which still corresponds to O(n).

Here's an example showing the state of the internal array along a sequence of addition operations:

                               //   ┌┐
var list = new List<char>();   //   ││   Count:    0
                               //   └┘   Capacity: 0
                               //   ┌───┬───┬───┬───┐
list.Add('h');                 //   │ h │ ░ │ ░ │ ░ │   Count:    1
                               //   └───┴───┴───┴───┘   Capacity: 4
                               //   ┌───┬───┬───┬───┐
list.Add('e');                 //   │ h │ e │ ░ │ ░ │   Count:    2
                               //   └───┴───┴───┴───┘   Capacity: 4
                               //   ┌───┬───┬───┬───┐
list.Add('l');                 //   │ h │ e │ l │ ░ │   Count:    3
                               //   └───┴───┴───┴───┘   Capacity: 4
                               //   ┌───┬───┬───┬───┐
list.Add('l');                 //   │ h │ e │ l │ l │   Count:    4
                               //   └───┴───┴───┴───┘   Capacity: 4
                               //   ┌───┬───┬───┬───┬───┬───┬───┬───┐
list.Add('o');                 //   │ h │ e │ l │ l │ o │ ░ │ ░ │ ░ │   Count:    5
                               //   └───┴───┴───┴───┴───┴───┴───┴───┘   Capacity: 8
                               //   ┌───┬───┬───┬───┬───┬───┬───┬───┐
list.Add(' ');                 //   │ h │ e │ l │ l │ o │   │ ░ │ ░ │   Count:    6
                               //   └───┴───┴───┴───┴───┴───┴───┴───┘   Capacity: 8
                               //   ┌───┬───┬───┬───┬───┬───┬───┬───┐
list.Add('w');                 //   │ h │ e │ l │ l │ o │   │ w │ ░ │   Count:    7
                               //   └───┴───┴───┴───┴───┴───┴───┴───┘   Capacity: 8
                               //   ┌───┬───┬───┬───┬───┬───┬───┬───┐
list.Add('o');                 //   │ h │ e │ l │ l │ o │   │ w │ o │   Count:    8
                               //   └───┴───┴───┴───┴───┴───┴───┴───┘   Capacity: 8
                               //   ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
list.Add('r');                 //   │ h │ e │ l │ l │ o │   │ w │ o │ r │ ░ │ ░ │ ░ │ ░ │ ░ │ ░ │ ░ │   Count:    9
                               //   └───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘   Capacity: 16
                               //   ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
list.Add('l');                 //   │ h │ e │ l │ l │ o │   │ w │ o │ r │ ░ │ ░ │ ░ │ ░ │ ░ │ ░ │ ░ │   Count:    10
                               //   └───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘   Capacity: 16
                               //   ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
list.Add('d');                 //   │ h │ e │ l │ l │ o │   │ w │ o │ r │ l │ d │ ░ │ ░ │ ░ │ ░ │ ░ │   Count:    11
                               //   └───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘   Capacity: 16

The symbol represents allocated space that is still unused. Those array locations would contain the default value for T. In the case of char, this will be the null character, \0. However, these values would never be visible to the consumer.

When adding multiple elements together through AddRange, only one reallocation is performed at most. If doubling the previous capacity would be insufficient to accommodate all the new elements, then the internal array is increased immediately to the new count instead.

Unlike addition, removing elements doesn't automatically shrink the list. However, you can cause this to happen manually by calling TrimExcess.

As mentioned in the comments, some aspects of the above (such as the default initial capacity of 4) are implementation details derived from the source code for .NET Framework 4.7.2. However, the core principles are well-entrenched and unlikely to change in other/future frameworks.

like image 137
Douglas Avatar answered Sep 21 '22 19:09

Douglas