Both, C and C++, support an seemingly equivalent set of escape sequences like \b
, \t
, \n
, \"
and others starting with the backslash character (\
). How is a backslash handled if normal character follows? As far as I remember from several compilers the escape character \
is silently skipped. On cppreference.com, I read these articles
I only found this note (in the C article) about orphan backslashes
ISO C requires a diagnostic if the backslash is followed by any character not listed here: [...]
above the reference table. I had also a look an some online compilers
#include <stdio.h>
int main(void) {
// your code goes here
printf("%d", !strcmp("\\ x", "\\ x"));
printf("%d", !strcmp("\\ x", "\\\ x"));
printf("%d", !strcmp("\\ x", "\\\\ x"));
return 0;
}
#include <iostream>
#include <string>
using namespace std;
int main() {
cout << (string("\\ x") == "\\ x");
cout << (string("\\ x") == "\\\ x");
cout << (string("\\ x") == "\\\\ x");
return 0;
}
Both treat "\\ x"
and "\\\ x"
as equivalent, (kind of) warning via syntax highlighting. IOW "\\\ x"
has been transformed into "\\ x"
.
Can I assume this to be defined behavior?
"\"
.Edit #2: Focus even more on constant being generated (and portability).
Answer is no. It is an invalid C program and unspecified behavior C++ one.
says it is syntactically wrong (emphasize is mine), it does not produce a valid token, thus the program is invalid:
5.2.1 Character sets
2/ In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or by escape sequences consisting of the backslash \ followed by one or more characters.
6.4.4.4 Character constants
3/ The single-quote ', the double-quote ", the question-mark ?, the backslash \, and arbitrary integer values are representable according to the following table of escape sequences:
- single quote '
\'
- double quote "
\"
- question mark ?
\?
- backslash \
\\
- octal character
\octal digits
- hexadecimal character
\xhexadecimal digits
8/ In addition, characters not in the basic character set are representable by universal character names and certain nongraphic characters are representable by escape sequences consisting of the backslash \ followed by a lowercase letter: \a, \b, \f, \n, \r, \t, and \v. Note : If any other character follows a backslash, the result is not a token and a diagnostic is required.
says differently (emphasize is mine):
5.13.3 Character literals
7/ Certain non-graphic characters, the single quote ’, the double quote ", the question mark ?,25 and the backslash \, can be represented according to Table 8. The double quote " and the question mark ?, can be represented as themselves or by the escape sequences \" and \? respectively, but the single quote ’ and the backslash \ shall be represented by the escape sequences \’ and \ respectively. Escape sequences in which the character following the backslash is not listed in Table 8 are conditionally-supported, with implementation-defined semantics. An escape sequence specifies a single character.
Thus for C++, you need to have a look at your compiler manual for the semantic, but the program is syntactically valid.
You need to compile with a conforming C compiler. Various online compilers tend to use gcc which is by default set to "lax non-standard mode", aka GNU C. This may or may not enable some non-standard escape sequences, but it also won't produce compiler errors even when you violate the C language - you might get away with a "warning", but that doesn't make the code valid C.
If you tell gcc to behave as a conforming C compiler with -std=c17 -pedantic-errors
, you get this error:
error: unknown escape sequence: '\040'
040 is octal for 32 which is the ASCII code for ' '
. (For some reason gcc uses octal notation for escape sequences internally, might be because \0 is octal, I don't know why.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With