I have an array defined as follows:
extern const char config_reg[] = {
0x05, //comment
0x00, //comment
0x00, // \\ <-- double backslash
0x01, //comment
0x03
}
As you can see, there is a double backslash inside a comment (the <-- double backslash and preceding spaces do not appear in the actual source file). When I compile this code (minus the "<-- double backslash") it acts as if the following line is non existent - i.e. equivalent to writing:
extern const char config_reg[] = {
0x05, //comment
0x00, //comment
0x00, //
0x03
}
Is this intended C++ behaviour? If so, what is its intended purpose?
I am compiling using the Parallax Propeller Simple IDE to compile my code - not a particularly good compiler, by all accounts. Is it likely that the compiler implementation is causing this behaviour?
That's correct, assuming that the <-- double backslash and preceding spaces aren't actually in the code.
A single backslash would also produce the same effect.
The newline splicing for backslash-newline occurs before comment analysis, so the 0x01 line is part of the same line as the // \\ comment, so it isn't seen when the comment analysis is done.
The ISO/IEC 14882:2011 (C++11) standard says:
2.2 Phases of translation [lex.phases]
¶1 The precedence among the syntax rules of translation is specified by the following phases.11
Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary. The set of physical source file characters accepted is implementation-defined. Trigraph sequences (2.4) are replaced by corresponding single-character internal representations. Any source file character not in the basic source character set (2.3) is replaced by the universal-character-name that designates that character. (An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (i.e., using the
\uXXXXnotation), are handled equivalently except where this replacement is reverted in a raw string literal.)Each instance of a backslash character (
\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. If, as a result, a character sequence that matches the syntax of a universal-character-name is produced, the behavior is undefined. A source file that is not empty and that does not end in a new-line character, or that ends in a new-line character immediately preceded by a backslash character before any such splicing takes place, shall be processed as if an additional new-line character were appended to the file.The source file is decomposed into preprocessing tokens (2.5) and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment.12 Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is unspecified. The process of dividing a source file’s characters into preprocessing tokens is context-dependent. [ Example: see the handling of
<within a#includepreprocessing directive. —end example ]11) Implementations must behave as if these separate phases occur, although in practice different phases might be folded together.
12) A partial preprocessing token would arise from a source file ending in the first portion of a multi-character token that requires a terminating sequence of characters, such as a header-name that is missing the closing
"or>. A partial comment would arise from a source file ending with an unclosed/*comment.
Yes, the second phase of translation involves "splicing physical source lines to form logical source lines"; if a line ends with a backslash, the following line is considered to be a continuation of that line. This is the standard behaviour. This occurs before the removal of comments in the third phase, so the fact that the backslash occurs in a comment doesn't change anything.
Line splicing is used quite frequently in C to split macros over multiple lines, since a preprocessor directive extends to the end of the line. It is much rarer in C++, which relies much less on macros than C.
I believe the original purpose in C was to work around restrictions on line length that existed on some now-archaic systems.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With