Yesterday, I was surprised to come across some code that seemed to treat char[]
as being a type:
typedef std::unique_ptr<char[]> CharPtr;
Previously, I would have written something like:
typedef std::unique_ptr<char*, CharDeleter> CharPtr;
// Custom definition of CharDeleter omitted
After some research, I discovered that the char[]
syntax works because std::unique_ptr
provides a template specialization to handle arrays (e.g. it will automatically invoke delete[]
for the array without requiring a custom deleter)
But what does char[]
actually mean in C++?
I've seen syntax like:
const char a[] = "Constant string"; // Example 1
char *p = new char[5]; // Example 2
bool foo(char param[10]); // Example 3
This is how I interpret these examples:
Example 1 allocates a static array (on the stack) and the empty indices are valid because the true size of the string is known at compile time (e.g. the compiler is basically handling the length for us behind the scenes)
Example 2 dynamically allocates 5 contiguous characters with the first character being stored at the address stored in p.
Example 3 defines a function that takes an array of size 10 as a parameter. (Behind the scenes the compiler treats the array like a pointer) -- e.g. it is an error to have:
void foo(char test[5]) {}
void foo(char * test) {}
because the function signatures are ambiguous to the compiler.
I feel like i understand the array/pointer differences and similarities. My confusion likely stems from my lack of experience with building/reading C++ templates.
I know that a template specialization basically allows a customized template (based on a particular template) to be used depending on the template type parameters. Is char[]
simply a syntax that is available for template specialization (invoking a particular specialization)?
Also, what is the proper name for array "types" like char[]
?
What does
char[]
actually mean in C++?
Let's find out:
[C++11: 8.3.4/1]:
In a declarationT D
whereD
has the form
D1 [
constant-expressionopt]
attribute-specifier-seqoptand the type of the identifier in the declaration
T D1
is “derived-declarator-type-listT
”, then the type of the identifier ofD
is an array type; if the type of the identifier ofD
contains theauto
type-specifier, the program is ill-formed.T
is called the array element type; this type shall not be a reference type, the (possibly cv-qualified) typevoid
, a function type or an abstract class type. If the constant-expression (5.19) is present, it shall be an integral constant expression and its value shall be greater than zero. The constant expression specifies the bound of (number of elements in) the array. If the value of the constant expression isN
, the array hasN
elements numbered0
toN-1
, and the type of the identifier ofD
is “derived-declarator-type-list array ofN T
”. An object of array type contains a contiguously allocated non-empty set ofN
subobjects of typeT
. Except as noted below, if the constant expression is omitted, the type of the identifier ofD
is “derived-declarator-type-list array of unknown bound ofT
”, an incomplete object type. The type “derived-declarator-type-list array ofN T
” is a different type from the type “derived-declarator-type-list array of unknown bound ofT
”, see 3.9. [..]
As you point out, these "arrays of unknown bounds" are being used through a std::unique_ptr
specialisation.
Regarding example 1, although it's surprisingly unclear in [C++11: 8.5.5]
, char[]
with initialiser is a special case that is not covered by the above text: a
is in fact a const char[16]
. So, yes, "the compiler is basically handling the length for us behind the scenes".
Example 3 defines a function that takes an array of size 10 as a parameter. (Behind the scenes the compiler treats the array like a pointer)
Almost. In fact there's nothing "behind-the-scenes" about it: the conversion is in the brochure. It's front and centre, explicit and standardised.
So:
-- e.g. it is an error to have:
void foo(char test[5]) {} void foo(char * test) {}
because the function signatures are ambiguous to the compiler.
In fact it is an error not through "ambiguity", but because you literally defined the same function twice.
char[]
is a type, but a type that you cannot have an instance of. It is an incomplete object type, somewhat like struct foo;
.
This means that templates can consume char[]
as a type if they choose to. They cannot create a variable of type char[]
, but they can interact with the type.
Now, there are a bunch of "magic" behavior attached to arrays inherited from C. As a function argument parameter, char[]
becomes char*
(as does char[33]
!)
As a local variable, char x[]="foo";
or char y[]={'a','b','c'};
becomes an array of fixed size. Here, char[]
means "auto-size the array".
In a sense, these are both quirks in parameter types and variable declarations rather than quirks of the type. The type you are declaring doesn't look all that much like the type you are declaring.
There is also a bunch of strangeness involving type decay -- a variable of type char[3]
like char x[3];
will decay to char*
at the drop of a hat. This, much like auto-sizing arrays, is basically a legacy from C.
All of this is explicitly described in the standard, but because it differs significantly from most "regular" types it acts like magic.
After all, any sufficiently obtuse feature of the standard is indistinguishable from magic.
Yes, char[]
denotes the compound type "array of unknown bound of char
". It is an incomplete type, but one that can be completed later:
extern char a[]; // "a" has incomplete type at point of declaration
char a[10]; // Now "a" has complete type.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With