Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does MSFT C# compile a Fixed "array to pointer decay" and "address of first element" differently?

Tags:

The .NET c# compiler (.NET 4.0) compiles the fixed statement in a rather peculiar way.

Here's a short but complete program to show you what I am talking about.

using System;  public static class FixedExample {      public static void Main() {         byte [] nonempty = new byte[1] {42};         byte [] empty = new byte[0];                  Good(nonempty);         Bad(nonempty);          try {             Good(empty);         } catch (Exception e){             Console.WriteLine(e.ToString());             /* continue with next example */         }         Console.WriteLine();         try {             Bad(empty);         } catch (Exception e){             Console.WriteLine(e.ToString());             /* continue with next example */         }      }      public static void Good(byte[] buffer) {         unsafe {             fixed (byte * p = &buffer[0]) {                 Console.WriteLine(*p);             }         }     }      public static void Bad(byte[] buffer) {         unsafe {             fixed (byte * p = buffer) {                 Console.WriteLine(*p);             }         }     } } 

Compile it with "csc.exe FixedExample.cs /unsafe /o+" if you want to follow along.

Here's the generated IL for the method Good:

Good()

  .maxstack  2   .locals init (uint8& pinned V_0)   IL_0000:  ldarg.0   IL_0001:  ldc.i4.0   IL_0002:  ldelema    [mscorlib]System.Byte   IL_0007:  stloc.0   IL_0008:  ldloc.0   IL_0009:  conv.i   IL_000a:  ldind.u1   IL_000b:  call       void [mscorlib]System.Console::WriteLine(int32)   IL_0010:  ldc.i4.0   IL_0011:  conv.u   IL_0012:  stloc.0   IL_0013:  ret 

Here's the generated IL for the method Bad:

Bad()

  .locals init (uint8& pinned V_0, uint8[] V_1)   IL_0000:  ldarg.0   IL_0001:  dup   IL_0002:  stloc.1   IL_0003:  brfalse.s  IL_000a   IL_0005:  ldloc.1   IL_0006:  ldlen   IL_0007:  conv.i4   IL_0008:  brtrue.s   IL_000f   IL_000a:  ldc.i4.0   IL_000b:  conv.u   IL_000c:  stloc.0   IL_000d:  br.s       IL_0017   IL_000f:  ldloc.1   IL_0010:  ldc.i4.0   IL_0011:  ldelema    [mscorlib]System.Byte   IL_0016:  stloc.0   IL_0017:  ldloc.0   IL_0018:  conv.i   IL_0019:  ldind.u1   IL_001a:  call       void [mscorlib]System.Console::WriteLine(int32)   IL_001f:  ldc.i4.0   IL_0020:  conv.u   IL_0021:  stloc.0   IL_0022:  ret 

Here's what Good does:

  1. Get the address of buffer[0].
  2. Dereference that address.
  3. Call WriteLine with that dereferenced value.

Here's what 'Bad` does:

  1. If buffer is null, GOTO 3.
  2. If buffer.Length != 0, GOTO 5.
  3. Store the value 0 in local slot 0,
  4. GOTO 6.
  5. Get the address of buffer[0].
  6. Deference that address (in local slot 0, which may be 0 or buffer now).
  7. Call WriteLine with that dereferenced value.

When buffer is both non-null and non-empty, these two functions do the same thing. Notice that Bad just jumps through a few hoops before getting to the WriteLine function call.

When buffer is null, Good throws a NullReferenceException in the fixed-pointer declarator (byte * p = &buffer[0]). Presumably this is the desired behavior for fixing a managed array, because in general any operation inside of a fixed-statement will depend on the validity of the object being fixed. Otherwise why would that code be inside the fixed block? When Good is passed a null reference, it fails immediately at the start of the fixed block, providing a relevant and informative stack trace. The developer will see this and realize that he ought to validate buffer before using it, or perhaps his logic incorrectly assigned null to buffer. Either way, clearly entering a fixed block with a null managed array is not desirable.

Bad handles this case differently, even undesirably. You can see that Bad does not actually throw an exception until p is dereferenced. It does so in the roundabout way of assigning null to the same local slot that holds p, then later throwing the exception when the fixed block statements dereference p.

Handling null this way has the advantage of keeping the object model in C# consistent. That is, inside the fixed block, p is still treated semantically as a sort of "pointer to a managed array" that will not, when null, cause problems until (or unless) it is dereferenced. Consistency is all well and good, but the problem is that p is not a pointer to a managed array. It is a pointer to the first element of buffer, and anybody who has written this code (Bad) would interpret its semantic meaning as such. You can't get the size of buffer from p, and you can't call p.ToString(), so why treat it as though it were an object? In cases where buffer is null, there is clearly a coding mistake, and I believe it would be vastly more helpful if Bad would throw an exception at the fixed-pointer declarator, rather than inside the method.

So it seems that Good handles null better than Bad does. What about empty buffers?

When buffer has Length 0, Good throws IndexOutOfRangeException at the fixed-pointer declarator. That seems like a completely reasonable way to handle out of bounds array access. After all, the code &buffer[0] should be treated the same way as &(buffer[0]), which should obviously throw IndexOutOfRangeException.

Bad handles this case differently, and again undesirably. Just as would be the case if buffer were null, when buffer.Length == 0, Bad does not throw an exception until p is dereferenced, and at that time it throws NullReferenceException, not IndexOutOfRangeException! If p is never dereferenced, then the code does not even throw an exception. Again, it seems that the idea here is to give p the semantic meaning of "pointer to a managed array". Yet again, I do not think that anybody writing this code would think of p that way. The code would be much more helpful if it threw IndexOutOfRangeException in the fixed-pointer declarator, thereby notifying the developer that the array passed in was empty, and not null.

It looks like fixed(byte * p = buffer) should have been compiled to the same code as was fixed (byte * p = &buffer[0]). Also notice that even though buffer could have been any arbitrary expression, it's type (byte[]) is known at compile time and therefore the code in Good would work for any arbitrary expression.

Edit

In fact, notice that the implementation of Bad actually does the error checking on buffer[0] twice. It does it explicitly at the beginning of the method, and then does it again implicitly at the ldelema instruction.


So we see that the Good and Bad are semantically different. Bad is longer, probably slower, and certainly does not give us desirable exceptions when we have bugs in our code, and even fails much later than it should in some cases.

For those curious, the section 18.6 of the spec (C# 4.0) says that behavior is "Implementation-defined" in both of these failure cases:

A fixed-pointer-initializer can be one of the following:

• The token “&” followed by a variable-reference (§5.3.3) to a moveable variable (§18.3) of an unmanaged type T, provided the type T* is implicitly convertible to the pointer type given in the fixed statement. In this case, the initializer computes the address of the given variable, and the variable is guaranteed to remain at a fixed address for the duration of the fixed statement.

• An expression of an array-type with elements of an unmanaged type T, provided the type T* is implicitly convertible to the pointer type given in the fixed statement. In this case, the initializer computes the address of the first element in the array, and the entire array is guaranteed to remain at a fixed address for the duration of the fixed statement. The behavior of the fixed statement is implementation-defined if the array expression is null or if the array has zero elements.

... other cases ...

Last point, the MSDN documentation suggests that the two are "equivalent" :

// The following two assignments are equivalent...

fixed (double* p = arr) { /.../ }

fixed (double* p = &arr[0]) { /.../ }

If the two are supposed to be "equivalent", then why use different error handling semantics for the former statement?

It also appears that extra effort was put into writing the code paths generated in Bad. The compiled code in Good works fine for all the failure cases, and is the same as the code in Bad in non-failure cases. Why implement new code paths instead of just using the simpler code generated for Good?

Why is it implemented this way?

like image 259
Michael Graczyk Avatar asked Aug 03 '12 21:08

Michael Graczyk


People also ask

Why is Microsoft stock falling?

Why Did Microsoft Stock Drop? One reason for the drop in Microsoft's stock is that big tech companies, the darlings of Wall Street for many a year, have fallen out of favor. The Nasdaq Composite is down roughly 24% this calendar year.

Why is MSFT stock going up?

Microsoft stock moved higher after the software giant posted results that were slightly below estimates, but showed continued strong demand for the company's cloud-computing business. The company also said it expects double-digit revenue and profit growth for its June 2023 fiscal year.

What is MSFT ne?

MSFT.NE - Microsoft CorporationNEO - NEO Real Time Price. Currency in CAD.


2 Answers

You might noticed that the IL code you included implements the spec almost line-for-line. That includes explicitly implementing the two exception cases listed in the spec in the case where they are relevant, and not including the code in the case where they aren't. So, the simplest reason why the compiler behaves the way it does is "because the spec said so".

Of course, that just leads to two further questions that we might ask:

  • Why did the C# language group choose to write the spec this way?
  • Why the did compiler team choose that specific implementation-defined behavior?

Short of someone from the appropriate teams showing up, we can't really hope to answer either of those questions completely. However, we can take a stab at answering the second one by trying to follow their reasoning.

Recall that the spec says, in the case of supplying an array to a fixed-pointer-initializer, that

The behavior of the fixed statement is implementation-defined if the array expression is null or if the array has zero elements.

Since the implementation is free to choose to do whatever it wants in this case, we can assume that will be whatever reasonable behavior was easiest and cheapest for the compiler team to do.

In this case, what the compiler team chose to do was "throw an exception at the point where your code does something wrong". Consider what the code would be doing if it were not inside a fixed-pointer-initializer and think about what else is happening. In your "Good" example, you are trying to take the address of an object that doesn't exist: the first element in a null/empty array. That's not something you can actually do, so it will produce an exception. In your "Bad" example, you are merely assigning the address of a parameter to a pointer variable; byte * p = null is a perfectly legitimate statement. It is only when you try to WriteLine(*p) that an error happens. Since the fixed-pointer-initializer is allowed to do whatever it wants in this exception case, the simplest thing to do is just permit the assignment to happen, as meaningless as it is.

Clearly, the two statements are not precisely equivalent. We can tell this by the fact that the standard treats them differently:

  • &arr[0] is: "The token “&” followed by a variable-reference", and so the compiler computes the address of arr[0]
  • arr is: "An expression of an array-type", and so the compiler computes the address of the array's first element, with the caveat that a null or 0-length array produces the implementation-defined behavior you're seeing.

The two produce equivalent results, so long as there is an element in the array, which is the point that the MSDN documentation is trying to get across. Asking questions about why explicitly undefined or implementation-defined behavior acts the way it does isn't really going to help you solve any particular problems, because you cannot rely on it to be true in the future. (Having said that, I'd of course be curious to know what the thought process was, since you obviously cannot "fix" a null value in memory...)

like image 184
Michael Edenfield Avatar answered Oct 02 '22 06:10

Michael Edenfield


So we see that the Good and Bad are semantically different. Why?

Because Good is case 1 and bad is case 2.

Good does not assign an "An expression of an array-type". It assigns "The token “&” followed by a variable-reference" so it is case 1. Bad assigns "An expression of an array-type" making it case 2. If this is true the MSDN documentation is wrong.

In any case this explains why the C# compiler creates two different (and in the second case specialized) code patterns.

Why does case 1 generate such simple code? I am speculating here: Taking the address of an array element is probably compiled the same way as using array[index] in a ref-expression. At the CLR level, ref parameters and expressions are just managed pointers. So is the expression &array[index]: It is compiled to a managed pointer that is not pinned but "interior" (this term comes from Managed C++ I think). The GC fixes it automatically. It behaves like a normal object reference.

So case 1 gets the usual managed pointer treatment while case 2 gets a special, implementation defined (not undefined) behavior.

This is not answering all of your questions but at least it provides some reasons for your observations. I'm kind of hoping for Eric Lippert to add his answer as an insider.

like image 27
usr Avatar answered Oct 02 '22 05:10

usr