Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why "foo\\<NEWLINE>bar" becomes "foo\bar" after "gcc -E"?

Tags:

c

See following example:

$ cat foo.c
int main()
{
    char *p = "foo\\
bar";
    return 0;
}
$ gcc -E foo.c
# 1 "foo.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "foo.c"
int main()
{
    char *p = "foo\bar";

    return 0;
}
$

From my understanding the 2nd \ is escaped by the 1st \ so the 2nd \ should not be combined with the following <NEWLINE> to form the line continuation.

like image 595
pynexj Avatar asked Jan 09 '17 07:01

pynexj


2 Answers

The preprocessor removes all occurrences of backslash-newline before even trying to tokenize the input; there is no escape mechanism for this. It's not limited to string literals either:

#inclu\
de <st\
dio.h>

int m\
ain(void) {
    /\
* yes, this is a comment */
    put\
s("Hello,\
 world!");
    return 0;
}

This is valid code.

Using \\ to get a single \ only applies to string and character literals and happens much later in processing.

like image 35
melpomene Avatar answered Oct 05 '22 12:10

melpomene


The rules are quite explicit in ISO/IEC 9899:2011 §5.1.1.2 Translation Phases:

  1. Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice.

The character preceding the final backslash is not consulted. Phase 1 converts trigraphs into regular characters. That matters because ??/ is the trigraph for \.

like image 106
Jonathan Leffler Avatar answered Oct 05 '22 11:10

Jonathan Leffler