Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the line with two backslashes fool C preprocessor?

I decided to take a look into the source of a standard Haskell module Data.List an found something interesting. The code is obviously firstly processed by C preprocessor, as a bunch of #ifdefs suggest, and only after that is compiled by Haskell compiler. However, C preprocessor is not very friendly to the source code different from C itself, as pointed out in the documentation:

The C preprocessor is intended to be used only with C, C++, and Objective-C source code. In the past, it has been abused as a general text processor. It will choke on input which does not obey C's lexical rules. For example, apostrophes will be interpreted as the beginning of character constants, and cause errors. Also, you cannot rely on it preserving characteristics of the input which are not significant to C-family languages. If a Makefile is preprocessed, all the hard tabs will be removed, and the Makefile will not work.

But somehow Haskell code is still preserved under C preprocessing? Probably this snippet gives a clue:

#ifdef __GLASGOW_HASKELL__
import GHC.Num
import GHC.Real
import GHC.List
import GHC.Base
#endif

infix 5 \\ -- comment to fool cpp

-- -----------------------------------------------------------------------------
-- List functions

How does this comment to fool cpp work? Looks like an interesting hack, but I could not google anything on the topic. In Haskell this line declares an infix operator \\ with priority 5, and all the text after -- is ignored. But what does it do with C preprocessor and in which way is it actually "fooled"?

like image 573
Wolfram Avatar asked Feb 11 '17 21:02

Wolfram


1 Answers

If you put simply this:

infix 5 \\

the C preprocessor issues the following message when the line is at the end of the file:

infix 5 \foo.c:8:10: warning: backslash-newline at end of file

And if it's not at the end of the file (thanks @commenter), it just "eats" one backslash (to associate it with the following newline) and the output is incorrect on the Haskell side:

infix 5 \

BUT if you add an Haskell-type comment afterwards, Haskell ignores it (obviously!), and it isn't a problem for the C preprocessor since \ isn't at the end of the line:

infix 5 \\ -- comment

cpp issues the exact same text and Haskell can parse the interesting part, stripping the --` comment.

(note that it is never valid C, but the preprocessor doesn't mind)

Note: the problem is similar if you want to end your C/C++ // comment line with \: you can't without it continuing the comment on the next line: not what you want.

like image 156
Jean-François Fabre Avatar answered Nov 07 '22 08:11

Jean-François Fabre