Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When did C++ compilers start considering more than two hex digits in string literal character escapes?

I've got a (generated) literal string in C++ that may contain characters that need to be escaped using the \x notation. For example:

char foo[] = "\xABEcho"; 

However, g++ (version 4.1.2 if it matters) throws an error:

test.cpp:1: error: hex escape sequence out of range 

The compiler appears to be considering the Ec characters as part of the preceding hex number (because they look like hex digits). Since a four digit hex number won't fit in a char, an error is raised. Obviously for a wide string literal L"\xABEcho" the first character would be U+ABEC, followed by L"ho".

It seems this has changed sometime in the past couple of decades and I never noticed. I'm almost certain that old C compilers would only consider two hex digits after \x, and not look any further.

I can think of one workaround for this:

char foo[] = "\xAB""Echo"; 

but that's a bit ugly. So I have three questions:

  • When did this change?

  • Why doesn't the compiler only accept >2-digit hex escapes for wide string literals?

  • Is there a workaround that's less awkward than the above?

like image 324
Greg Hewgill Avatar asked Apr 26 '11 01:04

Greg Hewgill


People also ask

What is a literal string escape sequence?

String literal syntax Use the escape sequence \n to represent a new-line character as part of the string. Use the escape sequence \\ to represent a backslash character as part of the string. You can represent a single quotation mark symbol either by itself or with the escape sequence \' .

How do you escape a character in a string in C++?

Escape sequencesAn escape sequence contains a backslash (\) symbol followed by one of the escape sequence characters or an octal or hexadecimal number. A hexadecimal escape sequence contains an x followed by one or more hexadecimal digits (0-9, A-F, a-f).

Can a character literal be an escape sequence?

A character literal contains a sequence of characters or escape sequences enclosed in single quotation mark symbols, for example 'c' . A character literal may be prefixed with the letter L, for example L'c' .


1 Answers

GCC is only following the standard. #877: "Each [...] hexadecimal escape sequence is the longest sequence of characters that can constitute the escape sequence."

like image 151
Ignacio Vazquez-Abrams Avatar answered Sep 29 '22 03:09

Ignacio Vazquez-Abrams