Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unscoped Enumeration, Enumerator & Underlying Type Ambiguity in C++

Tags:

I was going through the C++ standard n4713.pdf. Consider below code:

#include <iostream>
#include <type_traits>

enum UEn
{
    EN_0,
    EN_1,
    EN_L = 0x7FFFFFFFFFFFFFFF            // EN_L has type "long int"
};                                       // UEn has underlying type "unsigned long int"

int main()
{
    long lng = 0x7FFFFFFFFFFFFFFF;

    std::cout << std::boolalpha;
    std::cout << "typeof(unsigned long == UEn):" << std::is_same<unsigned long, std::underlying_type<UEn>::type>::value << std::endl;  // Outputs "true"
    std::cout << "sizeof(EN_L):" << sizeof(EN_L) << std::endl;
    std::cout << "sizeof(unsigned):" << sizeof(unsigned) << std::endl;
    std::cout << "sizeof(unsigned long):" << sizeof(unsigned long) << std::endl;
    std::cout << "sizeof(unsigned long):" << sizeof(unsigned long long) << std::endl;

    lng = EN_L + 1;                      // Invokes UB as EN_L is 0x7FFFFFFFFFFFFFFF and has type "long int"

    return 0;
}

The above code outputs (tested on g++-8.1, Clang):

typeof(unsigned long == UEn):true sizeof(EN_L):8 sizeof(unsigned):4 sizeof(unsigned long):8 sizeof(unsigned long):8

As per Section 10.2p5 (10.2 Enumeration declarations):

Following the closing brace of an enum-specifier, each enumerator has the type of its enumeration...If the underlying type is not fixed, the type of each enumerator prior to the closing brace is determined as follows:

  • If an initializer is specified for an enumerator, the constant-expression shall be an integral constant expression (8.6). If the expression has unscoped enumeration type, the enumerator has the underlying type of that enumeration type, otherwise it has the same type as the expression.

  • If no initializer is specified for the first enumerator, its type is an unspecified signed integral type.

  • Otherwise the type of the enumerator is the same as that of the preceding enumerator unless the incremented value is not representable in that type, in which case the type is an unspecified integral type sufficient to contain the incremented value. If no such type exists, the program is ill-formed.

Further, section 10.2p7 states:

For an enumeration whose underlying type is not fixed, the underlying type is an integral type that can represent all the enumerator values defined in the enumeration. If no integral type can represent all the enumerator values, the enumeration is ill-formed. It is implementation-defined which integral type is used as the underlying type except that the underlying type shall not be larger than int unless the value of an enumerator cannot fit in an int or unsigned int.


Thus I have following questions:

  1. Why is the underlying type of enum UEn an unsigned long when 0x7FFFFFFFFFFFFFFF is an integer constant of type long int and thus type of EN_L is also long int. Is this a compiler bug or well defined behaviour?
  2. When the standard says each enumerator has the type of its enumeration, shouldn't it imply that the integral types of enumerator and enumeration should also match? What could be the reason in having these two different from each other?
like image 410
Cheshar Avatar asked Jan 21 '19 18:01

Cheshar


2 Answers

The underlying type is implementation-defined. It only has to be able to represent every enumerator, and it can't be larger than int unless required. There is no requirement on signedness (aside that the base type has to be able to represent every enumerator), per dcl.enum.7, as you already found. This limits the back-propagation of enumerators' types more than you appear to assume. Notably, it doesn't say anywhere that the base type of the enum has to be the type of any of the enumerators' initializer.

Clang prefers unsigned integers as enum bases over signed integers; that's all there is to it. Importantly, the type of the enum does not have to match any specific enumerator's type: it only has to be able to represent every enumerator. This is fairly normal and well-understood in other contexts. For instance, if you had EN_1 = 1, it wouldn't surprise you that the enum's base type isn't int or unsigned int, even though 1 is an int.

You are also correct in saying that the type of 0x7fffffffffffffff is long. Clang agrees with you, however it implicitly casts the constant to unsigned long:

TranslationUnitDecl
`-EnumDecl <line:1:1, line:5:1> line:1:6 Foo
  |-EnumConstantDecl <line:2:5> col:5 Frob 'Foo'
  |-EnumConstantDecl <line:3:5> col:5 Bar 'Foo'
  `-EnumConstantDecl <line:4:5, col:11> col:5 Baz 'Foo'
    `-ImplicitCastExpr <col:11> 'unsigned long' <IntegralCast>
      `-IntegerLiteral <col:11> 'long' 576460752303423487

This is allowed, because as we said before, the enumeration's base type doesn't need to be the verbatim type of any enumerator.

When the standard says that each enumerator has the type of the enumeration, it means that the type of EN_1 is enum UEn after the enum's closing brace. Note the "after the closing brace" and "prior to the closing brace" mentions.

Prior to the closing brace, if the enum has no fixed type, then the type of each enumerator is that of its initializing expression type, but this is only temporary. This is what allows you, for instance, to write EN_2 = EN_1 + 1 without casting EN_1, even in the scope of an enum class. This is no longer true after the closing brace. You can trick the compiler into showing you by inspecting error messages or by looking at disassembly:

template<typename T>
T tell_me(const T&& value);

enum Foo {
    Baz = 0x7ffffffffffffff,
    Frob = tell_me(Baz)
    // non-constexpr function 'tell_me<long>' cannot be used in a constant expression
};

Notice that in this case T was inferred to be long, but after the closing brace, it's inferred to be Foo:

template<typename T>
T tell_me(const T&& value);

enum Foo {
    Baz = 0x7ffffffffffffff
};

int main() {
    tell_me(Baz);
    // call    Foo tell_me<Foo>(Foo const&&)
}

If you want your enum type to be signed with Clang, you need to specify it using the : base_type syntax, or you need to have a negative enumerator.

like image 181
zneak Avatar answered Jan 04 '23 18:01

zneak


I believe the answer for this (admittedly unintuitive) warning is in 7.6 Integral promotions [conv.prom]:

A prvalue of an unscoped enumeration type whose underlying type is not fixed (10.2) can be converted to a prvalue of the first of the following types that can represent all the values of the enumeration (i.e., the values in the range bmin to bmax as described in 10.2): int, unsigned int, long int, unsigned long int, long long int, or unsigned long long int.

I.e., if your underlying type is not fixed, and you use an enumeration member in an expression, it doesn't necessarily convert to the enumeration's underlying type. It instead converts to the first type in that list in which all members fit.

Don't ask me why, the rule seems nuts to me.

This section goes on to say:

A prvalue of an unscoped enumeration type whose underlying type is fixed (10.2) can be converted to a prvalue of its underlying type.

I.e. if you fix the underlying type with unsigned long:

enum UEn : unsigned long
...

then the warning goes away.

Another way to get rid of the warning (and leave the underlying type not fixed) is to add a member which requires unsigned long storage:

EN_2 = 0x8000000000000000

Then again, the warning goes away.

Good question. I learned a lot in answering it.

like image 32
Howard Hinnant Avatar answered Jan 04 '23 18:01

Howard Hinnant