Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unknown meta-character in C/C++ string literal?

I created a new project with the following code segment:

char* strange = "(Strange??)";
cout << strange << endl;

resulting in the following output:

(Strange]

Thus translating '??)' -> ']'

Debugging it shows that my char* string literal is actually that value and it's not a stream translation. This is obviously not a meta-character sequence I've ever seen. Some sort of Unicode or wide char sequence perhaps? I don't think so however... I've tried disabling all related project settings to no avail.

Anyone have an explanation?

  • search : 'question mark, question mark, close brace' c c++ string literal
like image 217
Marius Avatar asked Nov 03 '09 19:11

Marius


People also ask

What are the types of character literals in C++?

Character literals for C and C++ are char, string, and their Unicode and Raw type. Also, there is a multi-character literal that contains more than one c-char. A single c-char literal has type char and a multi-character literal is conditionally-supported, has type int, and has an implementation-defined value .

What is a string literal in Python?

A "string literal" is a sequence of characters from the source character set enclosed in double quotation marks ( " " ). String literals are used to represent a sequence of characters which, taken together, form a null-terminated string. You must always prefix wide-string literals with the letter L. char *amessage = "This is a string literal.";

What is the difference between single Char and multi character literals?

A single c-char literal has type char and a multi-character literal is conditionally-supported, has type int, and has an implementation-defined value . Want to learn from the best curated videos and practice problems, check out the C++ Foundation Course for Basic to Advanced C++ and C++ STL Course for foundation plus STL.

Why are there No 646 characters in the C programming language?

This question (about the closely related digraphs) has the answer. It boils down to the fact that the ISO 646 character set doesn't have all the characters of the C syntax, so there are some systems with keyboards and displays that can't deal with the characters (though I imagine that these are quite rare nowadays).


5 Answers

What you're seeing is called a trigraph.

In written language by grown-ups, one question mark is sufficient for any situation. Don't use more than one at a time and you'll never see this again.

GCC ignores trigraphs by default because hardly anyone uses them intentionally. Enable them with the -trigraph option, or tell the compiler to warning you about them with the -Wtrigraphs option.

Visual C++ 2010 also disables them by default and offers /Zc:trigraphs to enable them. I can't find anything about ways to enable or disable them in prior versions.

like image 179
Rob Kennedy Avatar answered Oct 10 '22 19:10

Rob Kennedy


Easy way to avoid the trigraph surprise: split a "??" string literal in two:

char* strange = "(Strange??)";
char* strange2 = "(Strange?" "?)";
/*                         ^^^ no punctuation */

Edit
gcc has an option to warn about trigraphs: -Wtrigraphs (enabled with -Wall also)
end edit

Quotes from the Standard

    5.2.1.1 Trigraph sequences
1   Before any other processing takes place, each occurrence of one of the
    following sequences of three characters (called trigraph sequences13))
    is replaced with the corresponding single character.
           ??=      #               ??)      ]               ??!      |
           ??(      [               ??'      ^               ??>      }
           ??/      \               ??<      {               ??-      ~
    No other trigraph sequences exist. Each ? that does not begin one of
    the trigraphs listed above is not changed.
    5.1.1.2 Translation phases
1   The precedence among the syntax rules of translation is specified by
    the following phases.
         1.   Physical source file multibyte characters are mapped, in an
              implementation-defined manner, to the source character set
              (introducing new-line characters for end-of-line indicators)
              if necessary. Trigraph sequences are replaced by corresponding
              single-character internal representations.
like image 36
pmg Avatar answered Oct 10 '22 18:10

pmg


It's a Trigraph!

like image 28
Adam Wright Avatar answered Oct 10 '22 17:10

Adam Wright


??) is a trigraph.

like image 4
Carl Norum Avatar answered Oct 10 '22 18:10

Carl Norum


That's trigraph support. You can prevent trigraph interpretation by escaping any of the characters:

char* strange = "(Strange?\?)";
like image 4
R Samuel Klatchko Avatar answered Oct 10 '22 18:10

R Samuel Klatchko