Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C/C++: Inherent ambiguity of "\xNNN" format in literal strings

Consider these two strings:

wchar_t* x = L"xy\x588xla";
wchar_t* y = L"xy\x588bla";

Upon reading this you would expect that both string literals are the same except one character - an 'x' instead of a 'b'.
It turns out that this is not the case. The first string compiles to:

y = {'x', 'y', 0x588,  'x', 'l', 'a' }

and the second is actually:

x = {'x', 'y', 0x588b, 'l', 'a' }

They are not even the same length!
Yes, the 'b' is eaten up by the hex representation ('\xNNN') character.

At the very least, this could cause confusion and subtle bugs for in hand-written strings (you could argue that unicode strings don't belong in the code body)

But the more serious problem, and the one I am facing, is in auto-generated code. There just doesn't seem to be any way to express this: {'x', 'y', 0x588, 'b', 'l', 'a' } as a literal string without resorting to writing the entire string in hex representation, which is wasteful and unreadable.

Any idea of a way around this?
What's the sense in the language behaving like this?

like image 829
shoosh Avatar asked Mar 14 '13 22:03

shoosh


People also ask

What is an example of a string literal?

A string literal is a sequence of zero or more characters enclosed within single quotation marks. The following are examples of string literals: 'Hello, world!' 'He said, "Take it or leave it."'

What is a literal string in C?

A "string literal" is a sequence of characters from the source character set enclosed in double quotation marks (" "). String literals are used to represent a sequence of characters which, taken together, form a null-terminated string.

What syntax does a programmer use to indicate something is a string literal?

String literals. A string literal is zero or more characters enclosed in double ( " ) or single ( ' ) quotation marks. A string must be delimited by quotation marks of the same type (that is, either both single quotation marks, or both double quotation marks).

What character enclose the literal in string class?

String literals can be enclosed by either double or single quotes, although single quotes are more commonly used. Backslash escapes work the usual way within both single and double quoted literals -- e.g. \n \' \".


1 Answers

A simple way is to use compile time string literal concatenation, thus:

wchar_t const* y = L"xy\x588" L"bla";
like image 98
Cheers and hth. - Alf Avatar answered Nov 09 '22 09:11

Cheers and hth. - Alf