Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the character set of a char literal guaranteed to be ASCII?

Coming from a discussion started here, does the standard specify values for characters? So, is '0' guaranteed to be 48? That's what ASCII would tell us, but is it guaranteed? If not, have you seen any compiler where '0' isn't 48?

like image 920
Luchian Grigore Avatar asked Oct 30 '12 15:10

Luchian Grigore


People also ask

What character set is ASCII?

The ASCII Character Set ASCII stands for the "American Standard Code for Information Interchange". It was designed in the early 60's, as a standard character set for computers and electronic devices. ASCII is a 7-bit character set containing 128 characters.

How does the ASCII character set work?

The ASCII character set is a 7-bit set of codes that allows 128 different characters. That is enough for every upper-case letter, lower-case letter, digit and punctuation mark on most keyboards. ASCII is only used for the English language.

What is universal character in c++?

Universal character names are formed by a prefix \U followed by an eight-digit Unicode code point, or by a prefix \u followed by a four-digit Unicode code point. All eight or four digits, respectively, must be present to make a well-formed universal character name. C++ Copy.

What is execution character set?

The execution character set is the encoding used for the text of your program that is input to the compilation phase after all preprocessing steps. This character set is used for the internal representation of any string or character literals in the compiled code.


2 Answers

No. There's no requirement for the either the source or execution character sets to use an encoding with an ASCII subset. I haven't seen any non-ASCII implementations but I know someone who knows someone who has. (It is required that '0' - '9' have contiguous integer values, but that's a duplicate question somewhere else on SO.)

The encoding used for the source character set controls how the bytes of your source code are interpreted into the characters used in the C++ language. The standard describes the members of the execution character set as having values. It is the encoding that maps these characters to their corresponding values the determines the integer value of '0'.

Although at least all of the members of the basic source character set plus some control characters and a null character with value zero must be present (with appropriate values) in the execution character set, there is no requirement for the encoding to be ASCII or to use ASCII values for any particular subset of characters (other than the null character).

like image 104
CB Bailey Avatar answered Sep 20 '22 18:09

CB Bailey


No, the Standard is very careful not to specify what the source character encoding is.

C and C++ compilers run on EBCDIC computers too, you know, where '0' != 0x30.

However, I believe it is required that '1' == '0' + 1.

like image 20
Ben Voigt Avatar answered Sep 19 '22 18:09

Ben Voigt