Assuming that a program is running on a system with UTF-16 encoding character set. So according to The C++ Programming Language - 4th, page 150: <blockquote> A char can hold a character of the machine’s character set. </blockquote> → I think that a char variable will have the size is 2-bytes. But according to ISO/IEC 14882:2014: <blockquote> <code>sizeof(char)</code>, <code>sizeof(signed char)</code> and <code>sizeof(unsigned char)</code> are 1". </blockquote> or The C++ Programming Language - 4th, page 149: <blockquote> "[...], so by definition the size of a char is 1" </blockquote> → It is fixed with size is 1. Question: Is there a conflict between these statements above or is the <code>sizeof(char) = 1</code> just a default (definition) value and will be implementation-defined depends on each system?

The C++ standard (and C, for that matter) effectively define <code>byte</code> as the size of a <code>char</code> type, not as an eight-bit quantity1. As per <code>C++11 1.7/1</code> (my bold): <blockquote> The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation defined. </blockquote> Hence the expression <code>sizeof(char)</code> is always 1, no matter what. If you want to see whether you baseline <code>char</code> variable (probably the <code>unsigned</code> variant would be best) can actually hold a 16-bit value, the item you want to look at is <code>CHAR_BIT</code> from <code><climits></code>. This holds the number of bits in a <code>char</code> variable. <hr> 1 Many standards, especially ones related to communications protocols, use the more exact term <code>octet</code> for an eight-bit value.

Confusing sizeof(char) by ISO/IEC in different character set encoding like UTF-16

1 Answers

The C++ standard (and C, for that matter) effectively define byte as the size of a char type, not as an eight-bit quantity¹. As per C++11 1.7/1 (my bold):

The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation defined.

Hence the expression sizeof(char) is always 1, no matter what.

If you want to see whether you baseline char variable (probably the unsigned variant would be best) can actually hold a 16-bit value, the item you want to look at is CHAR_BIT from <climits>. This holds the number of bits in a char variable.

¹ Many standards, especially ones related to communications protocols, use the more exact term octet for an eight-bit value.

104

answered Nov 10 '22 20:11

paxdiablo

Related questions
                            
                                pure-specifier on function-definition
                            
                                C++ "const" keyword explanation
                            
                                Performance cost of passing by value vs. by reference or by pointer?
                            
                                GCC issue: using a member of a base class that depends on a template argument
                            
                                Please explain syntax rules and scope for "typedef"
                            
                                Undefined reference to 'vtable for xxx'
                            
                                Exception in Destructor C++
                            
                                Why do some languages need Boxing and Unboxing?
                            
                                C++ range/xrange equivalent in STL or boost?
                            
                                Should I use public or private variables?
                            
                                How to construct std::array object with initializer list? [duplicate]
                            
                                How to parse space-separated floats in C++ quickly?
                            
                                How do I find out if a tuple contains a type?
                            
                                Does structured binding work with std::vector?
                            
                                Why does storing references (not pointers) in containers in C++ not work?
                            
                                C++ undefined reference to defined function
                            
                                What is the arithmetic mean of an empty sequence?
                            
                                Iterating over a vector in reverse direction
                            
                                const member and assignment operator. How to avoid the undefined behavior?
                            
                                C++ std::variant vs std::any

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Confusing sizeof(char) by ISO/IEC in different character set encoding like UTF-16

Tags:

c++

language-lawyer

sizeof

utf-16

kembedded

People also ask

1 Answers

paxdiablo

Recent Activity

Donate For Us