Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How runtime knows the exact type of a boxed value type?

I understand what boxing is. A value type is boxed to an object/reference type and is then stored on managed heap as an object. But I can't get thru unboxing.

Unboxing converts your object/reference type back to the value type

int i = 123;          // A value type
object box = i;       // Boxing
int j = (int)box;     // Unboxing

Alright. But if I try to unbox a value type into another value type, for example, long in above example, it throws InvalidCastException

long d = (long)box;

It leaves me with an idea that may be runtime implicitly knows the actual TYPE of value type boxed inside "box" object. If I am right, I wonder where this type information is stored.

EDIT:

Since int is implicitly convertible to long. This is what confusing me.

int i = 123;
long lng = i;

is perfectly fine because it has no boxing/unboxing involved.

like image 319
Arpit Khandelwal Avatar asked Jul 30 '14 11:07

Arpit Khandelwal


People also ask

Where does .NET store boxed value types?

Boxing is used to store value types in the garbage-collected heap. Boxing is an implicit conversion of a value type to the type object or to any interface type implemented by this value type.

What is a boxed variable?

Boxed values are data structures that are minimal wrappers around primitive types*. Boxed values are typically stored as pointers to objects on the heap. Thus, boxed values use more memory and take at minimum two memory lookups to access: once to get the pointer, and another to follow that pointer to the primitive.

Why boxing and unboxing is needed in C#?

Boxing and unboxing enables a unified view of the type system wherein a value of any type can ultimately be treated as an object. With Boxing and unboxing one can link between value-types and reference-types by allowing any value of a value-type to be converted to and from type object.

What is boxing and unboxing in VB net?

Boxing and unboxing are the processes that enable value types (e.g., integers) to be treated as reference types (objects). The value is “boxed” inside an Object and subsequently “unboxed” back to a value type. It is this process that allowed you to call the ToString( ) method on the integer in Example 6-4.


Video Answer


4 Answers

When a value is boxed it gets an object header. The kind that any type that derives from System.Object has. The value follows that header. The header contains two fields, one is the "syncblk", it has various uses that are beyond the scope of the question. The second field describes the type of object.

That's the one you are asking about. It has various names in literature, most commonly "type handle" or "method table pointer". The latter is the most accurate description, it is a pointer to the info the CLR keeps track of whenever it loads a type. Lots of framework features depend on it. Object.GetType() of course. Any cast in your code as well as the is and as operators use it. These casts are safe so you can't turn a Dog into a Cat, the type handle provides this guarantee. The method table pointer for your boxed int points to the method table for System.Int32

Boxing was very common in .NET 1.x, before generics became available. All of the common collection types stored object instead of T. So putting an element in the collection required (implicit) boxing, getting it out again required explicit unboxing with a cast.

To make this efficient, it was pretty important that the jitter didn't need to consider the possibility that a conversion would be required. Because that requires a lot more work. So the C# language included the rule that unboxing to another type is illegal. All that's needed now is a check on the type handle to ensure it is expected type. The jitter directly compares the method table pointer to the one for System.Int32 in your case. And the value embedded in the object can be copied directly without any conversion concerns. Pretty fast, as fast as it can possibly be, this can all be done with inline machine code without any CLR call.

This rule is specific to C#, VB.NET doesn't have it. Typical trade-off between those two languages, C#'s focus is on speed, VB.NET on convenience. Converting to another type when unboxing isn't otherwise a problem, all simple value types implement IConvertible. You write it explicit in your code, using the Convert helper class:

        int i = 123;                    // A value type
        object box = i;                 // Boxing
        long j = Convert.ToInt64(box);  // Conversion + unboxing

Which is pretty similar to the code that the VB.NET compiler auto-generates.

like image 112
Hans Passant Avatar answered Oct 02 '22 19:10

Hans Passant


It's because boxing instruction adds value type token into result object MSDN. When you are unboxing value from object, this variable is known type (and size in memory). Therefore you must cast object to original value type.

In your example you even don't need to cast it from int to long, because it's an implicit cast.

like image 43
Pavel Pája Halbich Avatar answered Oct 04 '22 19:10

Pavel Pája Halbich


It is because when you do boxing instead of moving the value type from stack to heap, it creates a copy of it in heap and stores the reference of it in stack in a new stack box. So your original stack object i.e. value type object along with its data type information remains in the stack and maintains its history. Now at the time of unboxing, it compares the type of object from heap to original data type in stack and if it finds mismatch gives the error. So, it is necessary to use same data type that you boxed while doing unboxing.

like image 2
Ritesh Hingorani Avatar answered Oct 04 '22 19:10

Ritesh Hingorani


Every reference object has a bunch of metadata associated with it. This includes the exact type of the given object (which is why you can have type safety at all).

So while the int is by-value, this information is actually missing (not that it matters), but once you box it, it creates a new object with all the necessary metadata. This also means that while an int is just 4 bytes, a boxed int is much more than that - you've got a reference now (4-8 bytes), the value itself (4) and the metadata (which includes the specific type handle). This is very different from e.g. C++, which allows you to cast any pointer to a pointer of any type (and leaving you to deal with the errors when you cast it wrong).

Again, all the by-reference objects have this metadata. This is quite an important part of the cost of reference types, but it is also the means by which you can be sure of the type safety. This also nicely shows how expensive ArrayList of int can really be, and why int[] or List<int> is much more efficient - even ignoring the costs of allocating (and more importantly collecting) heap objects and the boxing and unboxing itself, the 4 byte int could suddenly be 20 bytes, just because you're storing a reference to it :)

like image 1
Luaan Avatar answered Oct 04 '22 19:10

Luaan