Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Purpose of Trigraph sequences in C++?

People also ask

What is Trigraph C++?

A few characters have an alternative representation, called a trigraph sequence. A trigraph is a three-character sequence that represents a single character. The sequence always starts with two question marks. The third character determines which character the sequence represents.

What is a digraph in programming?

In computer programming, digraphs and trigraphs are sequences of two and three characters, respectively, that appear in source code and, according to a programming language specification, should be treated as if they were single characters.


This question (about the closely related digraphs) has the answer.

It boils down to the fact that the ISO 646 character set doesn't have all the characters of the C syntax, so there are some systems with keyboards and displays that can't deal with the characters (though I imagine that these are quite rare nowadays).

In general, you don't need to use them, but you need to know about them for exactly the problem you ran into. Trigraphs are the reason the the '?' character has an escape sequence:

'\?'

So a couple ways you can avoid your example problem are:

 printf( "What?\?!\n" ); 

 printf( "What?" "?!\n" ); 

But you have to remember when you're typing the two '?' characters that you might be starting a trigraph (and it's certainly never something I'm thinking about).

In practice, trigraphs and digraphs are something I don't worry about at all on a day-to-day basis. But you should be aware of them because once every couple years you'll run into a bug related to them (and you'll spend the rest of the day cursing their existance). It would be nice if compilers could be configured to warn (or error) when it comes across a trigraph or digraph, so I could know I've got something I should knowingly deal with.

And just for completeness, digraphs are much less dangerous since they get processed as tokens, so a digraph inside a string literal won't get interpreted as a digraph.

For a nice education on various fun with punctuation in C/C++ programs (including a trigraph bug that would defintinely have me pulling my hair out), take a look at Herb Sutter's GOTW #86 article.


Addendum:

It looks like GCC will not process (and will warn about) trigraphs by default. Some other compilers have options to turn off trigraph support (IBM's for example). Microsoft started supporting a warning (C4837) in VS2008 that must be explicitly enabled (using -Wall or something).


Kids today! :-)

Yes, foreign equipment, such as an IBM 3270 terminal. The 3270 has, if I remember, no curly braces! If you wanted to write C on an IBM mini / mainframe, you had to use the wretched trigraphs for every block boundary. Fortunately, I only had to write software in C to emulate some IBM minicomputer facilities, not actually write C software on the System/36.

Look next to the "P" key:

keyboard

Hmmm. Hard to tell. There is an extra button next to "carriage return", and I might have it backwards: maybe it was the "[" / "]" pair that was missing. At any rate, this keyboard would cause you grief if you had to write C.

Also, these terminals display EBCDIC, IBM's "native" mainframe character set, not ASCII (thanks, Pavel Minaev, for the reminder).

On the other hand, like the GNU C guide says: "You don't need this brain damage." The gcc compiler leaves this "feature" disabled by default.


From The C++ Programming Language Special Edition, page 829

The ASCII special characters [, ], {, }, |, and \ occupy character set positions designated as alphabetic by ISO. In most European national ISO-646 character sets, these positions are occupied by letters not found in the English alphabet.

A set of trigraphs is provided to allow national characters to be expressed in a portable way using a truly standard minimal character set. This can be useful for interchange of programs, but it doesn't make it easier for people to read programs. Naturally, the long-term solution to this problem is for C++ programmers to get equipment that supports both their native language and C++ well. Unfortunately, this appears to be infeasible for some, and the introduction of new equipment can be a frustratingly slow process.


They are for use on systems that lack some of the characters in C++'s basic character set. Needless to say, such systems are exceedingly rare.


Trigraphs have been proposed for removal in C++0x. That said, there still seems to be strong argument in support of them - see C++ committee paper N2910 which discusses this. Apparently, EBCDIC is one major stronghold where they are needed.


I've seen trigraphs used in the early '90s to help convert PL/1 programs from a mainframe to be run/compiled/debugged on a PC.

They were dabbling with editing PL/I on the PC using a PL/I to C compiler and they wanted the code to work when moved back to the mainframe which did not support curly braces. I suggested that they could use macros like

#def BEGIN {    
#def END }  

or as a friendlier PL/I alternative

#def BEGIN ??<
#def END ??>

and if they really wanted to get fancy they could try

#ifdef MAINFRAME
    #def BEGIN ??<
    #def END ??>
#else
    #def BEGIN {    
    #def END }  
#endif

and then the program would look like it was written in Pascal. They just looked at me funny and wouldn't speak to me for the rest of the day. I don't think I blame them. :)

What killed the effort what not the tri-graphs, it was the IO system differences between the platforms. Opening files on the PC was so much different than the mainframe it would have introduced way too many kludges to keep the same code running on both.