I read about collections in .NET nowadays. As known, there is some advantages using generic collections over non-generic: they are type-safety and there is no casting, no boxing/unboxing. That's why generic collections have a performance advantage.
If we consider that non-generic collections store every member as object
, then we can think that generics have also memory advantage. However, I didn't found any information about memory usage difference.
Can anyone clarify about the point?
A Generic collection is a class that provides type safety without having to derive from a base collection type and implement type-specific members. A Non-generic collection is a specialized class for data storage and retrieval that provides support for stacks, queues, lists and hashtables.
Generics are similar to collections, but implemented using Type parameters. Generic collections accept a type parameter and accept the elements of only those type for which the generic collection is instantiated. These enforce strict type checks.
Better Performance with Generics As you have seen in the above example, at the time of fetching the data from the ArrayList collection we need to do type casting which causes performance degrades. But at the time of fetching the data from the generic List, we don't require to do type casting.
cmd/compile: generic functions are significantly slower than identical non-generic functions in some cases #50182.
If we consider that non-generic collections store every member as object, then we can think that generics have also memory advantage. However, I didn't found any information about memory usage difference. Can anyone clarify about the point?
Sure. Let's consider an ArrayList
that contains int
s vs a List<int>
. Let's suppose there are 1000 int
s in each list.
In both, the collection type is a thin wrapper around an array -- hence the name ArrayList
. In the case of ArrayList
, there's an underlying object[]
that contains at least 1000 boxed ints. In the case of List<int>
, there's an underlying int[]
that contains at least 1000 int
s.
Why did I say "at least"? Because both use a double-when-full strategy. If you set the capacity of a collection when you create it then it allocates enough space for that many things. If you don't, then the collection has to guess, and if it guesses wrong and you need more capacity, then it doubles its capacity. So, best case, our collection arrays are exactly the right size. Worst case, they are possibly twice as big as they need to be; there could be room for 2000 objects or 2000 ints in the arrays.
But let's suppose for simplicity that we're lucky and there are about 1000 in each.
To start with, what's the memory burden of just the array? An object[1000]
takes up 4000 bytes on a 32 bit system and 8000 bytes on a 64 bit system, just for the references, which are pointer sized. An int[1000]
takes up 4000 bytes regardless. (There are also a few extra bytes taken up by array bookkeeping, but these costs are small compared to the marginal costs.)
So already we see that the non-generic solution possibly consumes twice as much memory just for the array. What about the contents of the array?
Well, the thing about value types is they are stored right there in their own variable. There is no additional space beyond those 4000 bytes used to store the 1000 integers; they get packed right into the array. So the additional cost is zero for the generic case.
For the object[]
case, each member of the array is a reference, and that reference refers to an object; in this case, a boxed integer. What's the size of a boxed integer?
An unboxed value type doesn't need to store any information about its type, because its type is determined by the type of the storage its in, and that's known to the runtime. A boxed value type needs to somewhere store the type of the thing in the box, and that takes space. It turns out that the bookkeeping overhead for an object in 32 bit .NET is 8 bytes, and 16 on 64 bit systems. That's just the overhead; we of course need 4 bytes for the int. But wait, it gets worse: on 64 bit systems, the box must be aligned to an 8 byte boundary, so we need another 4 bytes of padding on 64 bit systems.
Add it all up: Our int[]
takes about 4KB on both 64 and 32 bit systems. Our object[]
containing 1000 ints takes about 16KB on 32 bit systems, and 32K on 64 bit systems. So the memory efficiency of an int[]
vs an object[]
is either 4 or 8 times worse for the non-generic case.
But wait, it gets even worse. That's just size. What about access time?
To access an integer from an array of integers, the runtime must:
To access an integer from an array of boxed integers, the runtime must:
That's a lot more steps, so it takes a lot longer.
BUT WAIT IT GETS WORSE.
Modern processors use caches on the chip itself to avoid going back to main memory. An array of 1000 plain integers is highly likely to end up in the cache so that accesses to the first, second, third, etc, members of the array in quick succession are all pulled from the same cache line; this is insanely fast. But boxed integers can be all over the heap, which increases cache misses, which greatly slows down access even further.
Hopefully that sufficiently clarifies your understanding of the boxing penalty.
What about non-boxed types? Is there a significant difference between an array list of strings, and a List<string>
?
Here the penalty is much, much smaller, since an object[]
and a string[]
have similar performance characteristics and memory layouts. The only additional penalty in this case is (1) not catching your bugs until runtime, (2) making the code harder to read and edit, and (3) the slight penalty of a run-time type check.
then we can think that generics have also memory advantage
This assumption is false, it only applies on value-types. So considder this:
new ArrayList { 1, 2, 3 };
This will implicetly cast every integer into object
(known as boxing) in order to store it into your ArrayList
. This will cause your memory-overhead here, because an object
surely is bigger than a simple int
.
For reference-types there´s no difference however as there´s no need for boxing.
Using the one or the other shouldn´t be driven bei neither any performance- nor memory-issues. However you should ask yourself what you want to do with the results. In particular if you know the type(s) stored in your collection at compile-time, there´s no reason to not put this information into the compile-process by using the right generic type-argument.
Anyway you should allways use generic collections instead of non-generic ones because of the mentioned type-safety.
EDIT: Your actual question if using a non-generic collection or a generic version is quite pointless: allways use the generic one. But not because of its memory-usage. See this:
ArrayList a = new ArrayList { 1, 2, 3};
vs.
List<object> a = new List<object> { 1, 2, 3 };
Both lists will consume same amount of memory, although the second one is generic. That´s because they both box your integers into object
. So the answer to the question has nothing to do with memory.
On te other saying for reference-types there´s no memory-differencee at all:
ArrayList a = new ArrayList { myInstance, anotherInstance }
vs.
List<MyClass> a = new List<MyClass> { myInstance, anotherInstance }
will produce the same memory-outcome. However the second one is far easier to maintain as you can work with the instances directly without casting them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With