I this MSDN Magazine article, the author states (emphasis mine):
Note that boxing always creates a new object and copies the unboxed value's bits to the object. On the other hand, unboxing simply returns a pointer to the data within a boxed object: no memory copy occurs. However, it is commonly the case that your code will cause the data pointed to by the unboxed reference to be copied anyway.
I'm confused by the sentence I've bolded and the sentence that follows it. From everything else I've read, including this MSDN page, I've never before heard that unboxing just returns a pointer to the value on the heap. I was under the impression that unboxing would result in you having a variable containing a copy of the value on the stack, just as you began with. After all, if my variable contains "a pointer to the value on the heap", then I haven't got a value type, I've got a pointer.
Can someone explain what this means? Was the author on crack? (There is at least one other glaring error in the article). And if this is true, what are the cases where "your code will cause the data pointed to by the unboxed reference to be copied anyway"?
I just noticed that the article is nearly 10 years old, so maybe this is something that changed very early on in the life of .Net.
The article is accurate. It however talks about what really goes on, not what the IL looks like that the compiler generates. After all, a .NET program never executes IL, it executes the machine code that's generated from the IL by the JIT compiler.
And the unbox opcode indeed generates code that produces a pointer to the bits on the heap that represents the value type value. The JIT generates a call to a small helper function in the CLR named "JIT_Unbox". clr\src\vm\jithelpers.cpp if you got the SSCLI20 source code. The Object::GetData() function returns the pointer.
From there, the value most commonly first gets copied into a CPU register. Which then may get stored somewhere. It doesn't have to be the stack, it could be a member of a reference type object (the gc heap). Or a static variable (the loader heap). Or it could be pushed on the stack (method call). Or the CPU register could be used as-is when the value is used in an expression.
While debugging, right-click the editor window and choose "Go To Disassembly" to see the machine code.
The author of the original article must have been referring to what's happening down at the IL level. There exist two unboxing opcodes: unbox
and unbox.any
.
According to MSDN, regarding unbox.any
:
When applied to the boxed form of a value type, the unbox.any instruction extracts the value contained within obj (of type O), and is therefore equivalent to unbox followed by ldobj.
and regarding unbox
:
[...] unbox is not required to copy the value type from the object. Typically it simply computes the address of the value type that is already present inside of the boxed object.
So, the author knew what he was talking about.
This little fact about unbox
makes it possible to do certain nifty optimizations when working directly with IL. For example, if you have a boxed int which you need to pass to a function accepting a ref int, you can just emit an unbox
opcode, and the reference to the int will be ready in the stack for the function to operate upon. In this case the function will change the actual contents of the boxing object, something which is quite impossible at the C# level. It saves you from the need to allocate space for a temporary local variable, unbox the int in there, pass a ref to the int to the function, and then create a new boxing object to re-box the int, discarding the old box.
Of course, when you are working at the C# level, you cannot do any such optimizations, so what will usually be happening is that the code generated by the compiler will almost always be copying the variable from the boxed object prior to making any further use of it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With