Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stringize operator failure

The C and C++ standards all include text to the effect that if a stringize operation fails to produce a valid string literal token, the behavior is undefined. In C++11 this is actually possible, by including a newline character in a raw string literal. But the catch-all has always been there in the standards.

Is there any other way that stringize can produce UB, where UB or an ill-formed program hasn't already happened?

I'd be interested to hear about any dialect of C or C++ whatsoever. I'm writing a preprocessor.

like image 700
Potatoswatter Avatar asked Jul 02 '13 01:07

Potatoswatter


People also ask

What is a Stringize operator?

Stringizing operator (#) The number-sign or "stringizing" operator (#) converts macro parameters to string literals without expanding the parameter definition. It's used only with macros that take arguments.

What is the use of## in C?

This is called token pasting or token concatenation. The '##' pre-processing operator performs token pasting. When a macro is expanded, the two tokens on either side of each '##' operator are combined into a single token, which then replaces the '##' and the two original tokens in the macro expansion.

What is token pasting operator?

The double-number-sign or token-pasting operator (##), which is sometimes called the merging or combining operator, is used in both object-like and function-like macros. It permits separate tokens to be joined into a single token, and therefore, can't be the first or last token in the macro definition.

What is token passing in C?

On a local area network, token passing is a channel access method where a packet called a token is passed between nodes to authorize that node to communicate.


1 Answers

The stringify (#) operator only escapes \ in string constants. Indeed, \ has no particular significance outside of a string constant, except at the end of a line. It is, therefore, a preprocessing token (C section 6.4, C++ section 2.5).

Consequently, if we have

#define Q(X) #X

then

Q(\)

is a legitimate call: the \ is a preprocessing token which is never converted to a token, so it's valid. But you can't stringify \; that would give you "\" which is not a valid string literal. Hence, the behaviour of the above is undefined.

Here's a more amusing test case:

#define Q(A) #A
#define ESCAPE(c) Q(\c)
const char* new_line=ESCAPE(n);
const char* undefined_behaviour=ESCAPE(x);

A less interesting case of an undefined stringify is where the stringified parameter would be too long to be a string literal. (The standards recommend that the maximum size of a string literal be at least 65536 characters, but say nothing about the maximum size of a macro argument, which could presumably be larger.)

like image 74
rici Avatar answered Sep 23 '22 15:09

rici