The .NET c# compiler (.NET 4.0) compiles the fixed
statement in a rather peculiar way.
Here's a short but complete program to show you what I am talking about.
using System; public static class FixedExample { public static void Main() { byte [] nonempty = new byte[1] {42}; byte [] empty = new byte[0]; Good(nonempty); Bad(nonempty); try { Good(empty); } catch (Exception e){ Console.WriteLine(e.ToString()); /* continue with next example */ } Console.WriteLine(); try { Bad(empty); } catch (Exception e){ Console.WriteLine(e.ToString()); /* continue with next example */ } } public static void Good(byte[] buffer) { unsafe { fixed (byte * p = &buffer[0]) { Console.WriteLine(*p); } } } public static void Bad(byte[] buffer) { unsafe { fixed (byte * p = buffer) { Console.WriteLine(*p); } } } }
Compile it with "csc.exe FixedExample.cs /unsafe /o+" if you want to follow along.
Here's the generated IL for the method Good
:
Good()
.maxstack 2 .locals init (uint8& pinned V_0) IL_0000: ldarg.0 IL_0001: ldc.i4.0 IL_0002: ldelema [mscorlib]System.Byte IL_0007: stloc.0 IL_0008: ldloc.0 IL_0009: conv.i IL_000a: ldind.u1 IL_000b: call void [mscorlib]System.Console::WriteLine(int32) IL_0010: ldc.i4.0 IL_0011: conv.u IL_0012: stloc.0 IL_0013: ret
Here's the generated IL for the method Bad
:
Bad()
.locals init (uint8& pinned V_0, uint8[] V_1) IL_0000: ldarg.0 IL_0001: dup IL_0002: stloc.1 IL_0003: brfalse.s IL_000a IL_0005: ldloc.1 IL_0006: ldlen IL_0007: conv.i4 IL_0008: brtrue.s IL_000f IL_000a: ldc.i4.0 IL_000b: conv.u IL_000c: stloc.0 IL_000d: br.s IL_0017 IL_000f: ldloc.1 IL_0010: ldc.i4.0 IL_0011: ldelema [mscorlib]System.Byte IL_0016: stloc.0 IL_0017: ldloc.0 IL_0018: conv.i IL_0019: ldind.u1 IL_001a: call void [mscorlib]System.Console::WriteLine(int32) IL_001f: ldc.i4.0 IL_0020: conv.u IL_0021: stloc.0 IL_0022: ret
Here's what Good
does:
Here's what 'Bad` does:
When buffer
is both non-null and non-empty, these two functions do the same thing. Notice that Bad
just jumps through a few hoops before getting to the WriteLine
function call.
When buffer
is null, Good
throws a NullReferenceException
in the fixed-pointer declarator (byte * p = &buffer[0]
). Presumably this is the desired behavior for fixing a managed array, because in general any operation inside of a fixed-statement will depend on the validity of the object being fixed. Otherwise why would that code be inside the fixed
block? When Good
is passed a null reference, it fails immediately at the start of the fixed
block, providing a relevant and informative stack trace. The developer will see this and realize that he ought to validate buffer
before using it, or perhaps his logic incorrectly assigned null
to buffer
. Either way, clearly entering a fixed
block with a null
managed array is not desirable.
Bad
handles this case differently, even undesirably. You can see that Bad
does not actually throw an exception until p
is dereferenced. It does so in the roundabout way of assigning null to the same local slot that holds p
, then later throwing the exception when the fixed
block statements dereference p
.
Handling null
this way has the advantage of keeping the object model in C# consistent. That is, inside the fixed
block, p
is still treated semantically as a sort of "pointer to a managed array" that will not, when null, cause problems until (or unless) it is dereferenced. Consistency is all well and good, but the problem is that p is not a pointer to a managed array. It is a pointer to the first element of buffer
, and anybody who has written this code (Bad
) would interpret its semantic meaning as such. You can't get the size of buffer
from p
, and you can't call p.ToString()
, so why treat it as though it were an object? In cases where buffer
is null, there is clearly a coding mistake, and I believe it would be vastly more helpful if Bad
would throw an exception at the fixed-pointer declarator, rather than inside the method.
So it seems that Good
handles null
better than Bad
does. What about empty buffers?
When buffer
has Length 0, Good
throws IndexOutOfRangeException
at the fixed-pointer declarator. That seems like a completely reasonable way to handle out of bounds array access. After all, the code &buffer[0]
should be treated the same way as &(buffer[0])
, which should obviously throw IndexOutOfRangeException
.
Bad
handles this case differently, and again undesirably. Just as would be the case if buffer
were null
, when buffer.Length == 0
, Bad
does not throw an exception until p
is dereferenced, and at that time it throws NullReferenceException, not IndexOutOfRangeException! If p
is never dereferenced, then the code does not even throw an exception. Again, it seems that the idea here is to give p
the semantic meaning of "pointer to a managed array". Yet again, I do not think that anybody writing this code would think of p
that way. The code would be much more helpful if it threw IndexOutOfRangeException
in the fixed-pointer declarator, thereby notifying the developer that the array passed in was empty, and not null
.
It looks like fixed(byte * p = buffer)
should have been compiled to the same code as was fixed (byte * p = &buffer[0])
. Also notice that even though buffer
could have been any arbitrary expression, it's type (byte[]
) is known at compile time and therefore the code in Good
would work for any arbitrary expression.
Edit
In fact, notice that the implementation of Bad
actually does the error checking on buffer[0]
twice. It does it explicitly at the beginning of the method, and then does it again implicitly at the ldelema
instruction.
So we see that the Good
and Bad
are semantically different. Bad
is longer, probably slower, and certainly does not give us desirable exceptions when we have bugs in our code, and even fails much later than it should in some cases.
For those curious, the section 18.6 of the spec (C# 4.0) says that behavior is "Implementation-defined" in both of these failure cases:
A fixed-pointer-initializer can be one of the following:
• The token “&” followed by a variable-reference (§5.3.3) to a moveable variable (§18.3) of an unmanaged type T, provided the type T* is implicitly convertible to the pointer type given in the fixed statement. In this case, the initializer computes the address of the given variable, and the variable is guaranteed to remain at a fixed address for the duration of the fixed statement.
• An expression of an array-type with elements of an unmanaged type T, provided the type T* is implicitly convertible to the pointer type given in the fixed statement. In this case, the initializer computes the address of the first element in the array, and the entire array is guaranteed to remain at a fixed address for the duration of the fixed statement. The behavior of the fixed statement is implementation-defined if the array expression is null or if the array has zero elements.
... other cases ...
Last point, the MSDN documentation suggests that the two are "equivalent" :
// The following two assignments are equivalent...
fixed (double* p = arr) { /.../ }
fixed (double* p = &arr[0]) { /.../ }
If the two are supposed to be "equivalent", then why use different error handling semantics for the former statement?
It also appears that extra effort was put into writing the code paths generated in Bad
. The compiled code in Good
works fine for all the failure cases, and is the same as the code in Bad
in non-failure cases. Why implement new code paths instead of just using the simpler code generated for Good
?
Why is it implemented this way?
Why Did Microsoft Stock Drop? One reason for the drop in Microsoft's stock is that big tech companies, the darlings of Wall Street for many a year, have fallen out of favor. The Nasdaq Composite is down roughly 24% this calendar year.
Microsoft stock moved higher after the software giant posted results that were slightly below estimates, but showed continued strong demand for the company's cloud-computing business. The company also said it expects double-digit revenue and profit growth for its June 2023 fiscal year.
MSFT.NE - Microsoft CorporationNEO - NEO Real Time Price. Currency in CAD.
You might noticed that the IL code you included implements the spec almost line-for-line. That includes explicitly implementing the two exception cases listed in the spec in the case where they are relevant, and not including the code in the case where they aren't. So, the simplest reason why the compiler behaves the way it does is "because the spec said so".
Of course, that just leads to two further questions that we might ask:
Short of someone from the appropriate teams showing up, we can't really hope to answer either of those questions completely. However, we can take a stab at answering the second one by trying to follow their reasoning.
Recall that the spec says, in the case of supplying an array to a fixed-pointer-initializer, that
The behavior of the fixed statement is implementation-defined if the array expression is null or if the array has zero elements.
Since the implementation is free to choose to do whatever it wants in this case, we can assume that will be whatever reasonable behavior was easiest and cheapest for the compiler team to do.
In this case, what the compiler team chose to do was "throw an exception at the point where your code does something wrong". Consider what the code would be doing if it were not inside a fixed-pointer-initializer and think about what else is happening. In your "Good" example, you are trying to take the address of an object that doesn't exist: the first element in a null/empty array. That's not something you can actually do, so it will produce an exception. In your "Bad" example, you are merely assigning the address of a parameter to a pointer variable; byte * p = null
is a perfectly legitimate statement. It is only when you try to WriteLine(*p)
that an error happens. Since the fixed-pointer-initializer is allowed to do whatever it wants in this exception case, the simplest thing to do is just permit the assignment to happen, as meaningless as it is.
Clearly, the two statements are not precisely equivalent. We can tell this by the fact that the standard treats them differently:
&arr[0]
is: "The token “&” followed by a variable-reference", and so the compiler computes the address of arr[0]arr
is: "An expression of an array-type", and so the compiler computes the address of the array's first element, with the caveat that a null or 0-length array produces the implementation-defined behavior you're seeing.The two produce equivalent results, so long as there is an element in the array, which is the point that the MSDN documentation is trying to get across. Asking questions about why explicitly undefined or implementation-defined behavior acts the way it does isn't really going to help you solve any particular problems, because you cannot rely on it to be true in the future. (Having said that, I'd of course be curious to know what the thought process was, since you obviously cannot "fix" a null value in memory...)
So we see that the Good and Bad are semantically different. Why?
Because Good is case 1 and bad is case 2.
Good does not assign an "An expression of an array-type". It assigns "The token “&” followed by a variable-reference" so it is case 1. Bad assigns "An expression of an array-type" making it case 2. If this is true the MSDN documentation is wrong.
In any case this explains why the C# compiler creates two different (and in the second case specialized) code patterns.
Why does case 1 generate such simple code? I am speculating here: Taking the address of an array element is probably compiled the same way as using array[index]
in a ref
-expression. At the CLR level, ref
parameters and expressions are just managed pointers. So is the expression &array[index]
: It is compiled to a managed pointer that is not pinned but "interior" (this term comes from Managed C++ I think). The GC fixes it automatically. It behaves like a normal object reference.
So case 1 gets the usual managed pointer treatment while case 2 gets a special, implementation defined (not undefined) behavior.
This is not answering all of your questions but at least it provides some reasons for your observations. I'm kind of hoping for Eric Lippert to add his answer as an insider.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With