I was wondering if there is any good reference (website or even better, a book) where I can find information about the internal implementation of the commonly used collections like
Dictionary<TKey, TValue>
List<T>
Queue<T>
Stack<T>
By internal implementation, I mean how they use a dynamic array to store their data, how often do they resize it, what is the time and space complexity for the common operations.
Of course, if anybody feels he can provide this information in this thread, you are more than welcome!
List<T>
:List<T>
has two properties, Capacity
and Count
that help clarify when the resizing takes place. Capacity
is the length of the internal array at any given time and Count
is the number of elements added in the List. If you have an estimate of the number of elements to be added in the list, Capacity
can be initialized (by choosing the appropriate constructor), which will lead to fewer or no resizings, and thus better performance.
Resizing (i.e. creation of a new larger array and copy of elements one by one to the new array) happens when Add<T>()
method is called and the array is already full (Count == Capacity
). The Capacity of the new array is doubled (initially, if not set by the user, it starts from 0, then 4 and then it gets doubled each time more space needed):
List<int> list = new List<int>();
//Capacity = 0, Count = 0
list.Add(52);
//Capacity = 4, Count = 1
list.Add(34);
list.Add(2);
list.Add(87);
//Capacity = 4, Count = 4
list.Add(56);
//Capacity = 8, Count = 5
For large n
, the time complexity for adding a new element is amortized constant O(1). The lookup by index is constant O(1), and the insertion or removal of an element at a given index is linear O(n) because it involves shifting the rest of elements one position (to the right or left, respectively). The space used for the internal array is of course linear to the elements of the array, varying from n
to 2n
(or, if this makes sense: Math.Pow(2, Math.Ceiling(Math.Log(n, 2)))
:).
Queue<T>
and Stack<T>
:The resizing of Queue's and Stack's internal array works similarly to what described for the List<T>
.The common operations are efficient O(1) (internally, indexes are kept for Queue's head and tail elements). So, enqueuing an element in the queue or pushing in the stack needs amortized constant time, dequeuing/poping needs constant time.
Dictionary<TKey, TValue>
:Dictionary works differently, and it is nicely described here.
The exact implementation details of each of these would take a long explanation (for each one). Instead I would refer you to J. Albahari's book C# 5.0 In a Nutshell.
However, I can give you a table for memory/time consideration for common operations for the Dictonary classes. These performance times are in milliseconds, to perform 50,000 operations on a dictionary with integer keys and values, on a 1.5GHz PC.
Type Internal Retrieve by Memeory Speed Random Speed Seq Speed Retrieval
Structure Index? Overhead Insertion Insertion by Key
Unsorted
Dictionary<T> Hashtable No 22 30 30 20
Hashtable Hashtable No 38 50 50 30
ListDictonary Linked List No 36 50,000 50,000 50,000
OrderedDictionary Hashtable + Yes 59 70 70 40
Array
Sorted
SortedDictionary Red-Black No 20 130 100 120
<K, V> Tree
SortedList <K, V> 2xArray Yes 2 3,300 30 40
SortedList 2xArray Yes 27 4,500 100 180
Sorry I can't provide this for the others you require.
I hope this is of some use.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With