Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++11 and [17.5.2.1.3] Bitmask Types

The Standard allows one to choose between an integer type, an enum, and a std::bitset.

Why would a library implementor use one over the other given these choices?

Case in point, llvm's libcxx appears to use a combination of (at least) two of these implementation options:

ctype_base::mask is implemented using an integer type: <__locale>

regex_constants::syntax_option_type is implemented using an enum + overloaded operators: <regex>

The gcc project's libstdc++ uses all three:

ios_base::fmtflags is implemented using an enum + overloaded operators: <bits/ios_base.h>

regex_constants::syntax_option_type is implemented using an integer type, regex_constants::match_flag_type is implemented using a std::bitset
Both: <bits/regex_constants.h>

AFAIK, gdb cannot "detect" the bitfieldness of any of these three choices so there would not be a difference wrt enhanced debugging.

The enum solution and integer type solution should always use the same space. std::bitset does not seem to make the guarantee that sizeof(std::bitset<32>) == std::uint32_t so I don't see what is particularly appealing about std::bitset.

The enum solution seems slightly less type safe because the combinations of the masks does not generate an enumerator.

Strictly speaking, the aforementioned is with respect to n3376 and not FDIS (as I do not have access to FDIS).

Any available enlightenment in this area would be appreciated.

like image 544
user1290696 Avatar asked Mar 25 '12 01:03

user1290696


2 Answers

The really surprising thing is that the standard restricts it to just three alternatives. Why shouldn't a class type be acceptable? Anyway…

  • Integral types are the simplest alternative, but they lack type safety. Very old legacy code will tend to use these as they are also the oldest.
  • Enumeration types are safe but cumbersome, and until C++11 they tended to be fixed to the size and range of int.
  • std::bitset may be have somewhat more type safety in that bitset<5> and bitset<6> are different types, and addition is disallowed, but otherwise is unsafe much like an integral type. This wouldn't be an issue if they had allowed types derived from std::bitset<N>.

Clearly enums are the ideal alternative, but experience has proven that the type safety is really unnecessary. So they threw implementers a bone and allowed them to take easier routes. The short answer, then, is that laziness leads implementers to choose int or bitset.

It is a little odd that types derived from bitset aren't allowed, but really that's a minor thing.

The main specification that clause provides is the set of operations defined over these types (i.e., the bitwise operators).

like image 173
Potatoswatter Avatar answered Oct 17 '22 16:10

Potatoswatter


My preference is to use an enum, but there are sometimes valid reasons to use an integer. Usually ctype_base::mask interacts with the native OS headers, with a mapping from ctype_base::mask to the <ctype.h> implementation-defined constants such as _CTYPE_L and _CTYPE_U used for isupper and islower etc. Using an integer might make it easier to use ctype_base::mask directly with native OS APIs.

I don't know why libstdc++'s <regex> uses a std::bitset. When that code was committed I made a mental note to replace the integer types with an enumeration at some point, but <regex> is not a priority for me to work on.

like image 43
Jonathan Wakely Avatar answered Oct 17 '22 15:10

Jonathan Wakely