Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Integral type promotion inconsistency

Tags:

c#

using System;

public class Tester
{
    public static void Main()
    {
        const uint x=1u;
        const int y=-1;
        Console.WriteLine((x+y).GetType());
        // Let's refactor and inline y... oops!
        Console.WriteLine((x-1).GetType());
    }
}

Imagine the code above being used in the following case:

public long Foo(uint x)
{
    const int y = -1;
    var ptr = anIntPtr.ToInt64() + (x + y) * 4096;
    return ptr;
}

It looks like it's perfectly safe to inline y, but it's actually not. This inconsistency in the language itself is counter-intuitive and is plain dangerous. Most programmers would simply inline y, but you'd actually end up with an integer overflow bug. In fact, if you write code such as the above, you'd easily have the next person working on the same piece of code inline y without even thinking twice.

I argue that this is a very counter-productive language design issue of C#.

First question, where is this behaviour defined in the C# specs and why was it designed this way?

Second question, 1.GetType()/(-1).GetType() gives System.Int32. Why then is it behaving differently to const int y=-1?

Third question, if it implicitly gets converted to uint, then how do we explicitly tell the compiler it's a signed integer (1i isn't a valid syntax!)?

Last question, this can't be a desired behaviour intended by the language design team (Eric Lippert to chime in?), can it?

like image 904
Zach Saw Avatar asked Sep 29 '17 12:09

Zach Saw


2 Answers

This behaviour is described by section 6.1.9 of the C# standard, Implicit constant expression conversions:

• A constant-expression (§7.19) of type int can be converted to type sbyte, byte, short, ushort, uint, or ulong, provided the value of the constant-expression is within the range of the destination type.

So you have const uint x = 1u; and the constant expression (x - 1).

According to the specification, the result of that x - 1 would normally be int, but because the value of the constant expression (i.e. 0) is within range of uint it will be treated as uint.

Note that here the compiler is treating the 1 as unsigned.

If you change the expression to (x + -1) it treats the -1 as signed and changes the result to int. (In this case, the - in -1 is a "unary operator" which converts the type of the result of -1 to int, so the compiler can no longer convert it to uint like it could for plain 1).

This part of the specification implies that if we were to change the constant expression to x - 2 then the result would no longer be a uint but would instead be converted to int. However, if you make that change you get a compile error stating that the result would overflow a uint.

That's because of another part of the C# spec, in section 7.19 Constant Expressions which states:

The compile-time evaluation of constant expressions uses the same rules as run-time evaluation of non-constant expressions, except that where run-time evaluation would have thrown an exception, compile-time evaluation causes a compile-time error to occur.

In this case, there would have been an overflow if doing a checked calculation, so the compiler balks.


With regard to this:

const uint x = 1u;
const int y = -1;
Console.WriteLine((x + y).GetType()); // Long

That's the same as this:

Console.WriteLine((1u + -1).GetType()); // Long

This is because the -1 is of type int and the 1u is of type uint.

Section 7.3.6.2 Binary numeric promotions describes this:

• Otherwise, if either operand is of type uint and the other operand is of type sbyte, short, or int, both operands are converted to type long.

(I omitted the part not relevant to this specific expression.)


Addendum: I just wanted to point out a subtle difference in the unary minus (aka "negation") operator between constant and non-constant values.

According to the standard:

If the operand of the negation operator is of type uint, it is converted to type long, and the type of the result is long.

That is true for variables:

var p = -1;
Console.WriteLine(p.GetType()); // int

var q = -1u;
Console.WriteLine(q.GetType()); // long

var r = 1u;
Console.WriteLine(r.GetType()); // uint

Although for compile-time constants the value of 1 is converted to uint if an expression involving uint is using it, in order to keep the whole expression as a uint, -1 is actually treated as an int.

I do agree with the OP - this is very subtle stuff, leading to various surprises.

like image 173
Matthew Watson Avatar answered Oct 07 '22 05:10

Matthew Watson


First question, where is this behaviour defined in the C# specs

Your first question is the answerable one, and it got answered in Matthew Watson's excellent answer.

why was it designed this way?

All design processes require making tradeoffs between a variety of competing design goals. The design goals of C# included such diverse elements as familiarity to C++ developers, ability to interoperate with unmanaged libraries that use non-.NET-friendly conventions such as unsigned integer types, the ability to write a compiler that figures out what you meant in possibly ambiguous situations, but still informs you when it looks like you did something wrong, and so on.

"Values can be seamlessly substituted for symbols evaluating to those values" is a good principle of language design. But it's not the only one. Since several of these goals are, in your case, contradictory, something's got to give. (Also, as I note below, you aren't substituting values!)

I agree with you that the fact that x + -1 and x - 1 have different types is weird. What type would you like them both to be?

Let's suppose you want them both to be long. Now we have the following problem: what is the type of x - x? If it is uint, because we have the different of two uints, then we have the weirdness that x - x and x - 1 are different types. If it is long, then we have the weirdness that the difference of two uints that fits into a uint is not a uint.

Let's suppose you want them both to be uint. Then should x + anySignedInt be uint? Why should it be uint instead of int? Surely if we have uint 2 and int -3, then 2 + -3 should be the int -1.

No matter what you do, you end up with a weird situation. That's because unsigned quantities don't obey the usual rules of arithmetic. The language design team is doing the best they can with a bad situation.

The exact details of how these decisions were arrived at 17 years ago is lost to the mists of time.

Second question, the types of 1 and -1 are System.Int32. Why then is it behaving differently to const int y=-1?

I assume your question is "why are x + y and x - 1 analyzed differently when plainly they are equivalent expressions?", but they are not the same expression. x + y and x + -1 are the same expression, and they are analyzed the same; the sum of a uint and an int constant which does not fit into a uint promotes both to long. The difference of two uints is a uint.

Your fundamental error is that you believe that addition of a negative, and subtraction of a positive are the same thing. In unsigned arithmetic they are not because in unsigned arithmetic there's no such thing as "addition of a negative". There are no negatives!

if it implicitly gets converted to uint, then how do we explicitly tell the compiler it's a signed integer (1i isn't a valid syntax!)?

I don't understand the question. You tell the compiler the types of things with casts, but I don't think that's what you're asking.

Last question, this can't be a desired behaviour intended by the language design team (Eric Lippert to chime in?), can it?

I no longer speak on behalf of the language design team, but I can tell you what I would say had you asked this question when I was:

The language design team strongly desires you to not use uints, and particularly, to never mix int and uint in the same expression, because doing so is confusing and weird. Only use uints for interoperating with unmanaged code that uses uints.

You'll notice that uints are not in the common language subset, and that many quantities that are logically never negative, like the length of a string or an array, are nevertheless always ints in .NET. There's a reason for that. Use ints or longs.

like image 41
Eric Lippert Avatar answered Oct 07 '22 06:10

Eric Lippert