Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is "\?" an escape sequence in C/C++?

There are four special non-alphabet characters that need to be escaped in C/C++: the single quote \', the double quote \", the backslash \\, and the question mark \?. It's apparently because they have special meanings. ' for single char, " for string literals, \ for escape sequences, but why is ? one of them?

I read the table of escape sequences in a textbook today and I realized that I've never escaped ? before and have never encountered a problem with it. Just to be sure, I tested it under GCC:

#include <stdio.h> int main(void) {     printf("question mark ? and escaped \?\n");     return 0; } 

And the C++ version:

#include <iostream> int main(void) {     std::cout << "question mark ? and escaped \?" << std::endl;     return 0; } 

Both programs output: question mark ? and escaped ?

So I have two questions:

  1. Why is \? one of the escape sequence characters?
  2. Why does non-escaping ? work fine? There's not even a warning.

The more interesting fact is that the escaped \? can be used the same as ? in some other languages as well. I tested in Lua/Ruby, and it's also true even though I didn't find this documented.

like image 369
Yu Hao Avatar asked Oct 15 '13 06:10

Yu Hao


People also ask

What is escape sequence in C?

Escape sequences You can represent any member of the execution character set by an escape sequence. They are primarily used to put nonprintable characters in character and string literals. For example, you can use escape sequences to put such characters as tab, carriage return, and backspace into an output stream.

Why we are using escape sequence?

Escape sequences are typically used to specify actions such as carriage returns and tab movements on terminals and printers. They are also used to provide literal representations of nonprinting characters and characters that usually have special meanings, such as the double quotation mark (").

How many escape sequence are there in C?

There are 15 types of escape sequence in C to achieve various purposes.

What is escape sequence for null character in C?

But there is no way to escape the null character. To "escape" something means to remove its usual or its special interpretation, or to give it some other interpretation. In C string and character constants, the backslash character \ gives a special meaning to the character following it.


1 Answers

Why is \? one of the escape sequence characters?

Because it is special. The answer leads to Trigraph, and the C/C++ preprocessor replaces the following three-character sequences with the corresponding single character. (C11 §5.2.1.1 and C++11 §2.3)

Trigraph:       ??(  ??)  ??<  ??>  ??=  ??/  ??'  ??!  ??- Replacement:      [    ]    {    }    #    \    ^    |    ~ 

A trigraph is nearly useless now, and it is mainly used for obfuscation purposes. Some examples can be seen in IOCCC.

GCC doesn't support trigraph by default and will warn you if there's a trigraph in the code, unless the option -trigraphs3 is enabled. Under the -trigraphs option, the second \? is useful in the following example:

printf("\?\?!\n"); 

Output would be | if ? is not escaped.

For more information on trigraphs, see Cryptic line "??!??!" in legacy code


Why does non-escaping ? work fine. There's not even a warning.

Because ?(and double quote ") can be represented by themselves by the standard:

C11 §6.4.4.4 Character constants Section 4

The double-quote " and question-mark ? are representable either by themselves or by the escape sequences \" and \?, respectively, but the single-quote ' and the backslash \ shall be represented, respectively, by the escape sequences \' and \\.

Similar in C++:

C++11 §2.13.2 Character literals Section 3

Certain nongraphic characters, the single quote , the double quote ", the question mark ?, and the backslash \, can be represented according to Table 6. The double quote " and the question mark ?, can be represented as themselves or by the escape sequences \" and \? respectively, but the single quote and the backslash \ shall be represented by the escape sequences \’ and \\ respectively. If the character following a backslash is not one of those specified, the behavior is undefined. An escape sequence specifies a single character.

like image 127
Yu Hao Avatar answered Oct 08 '22 14:10

Yu Hao