C++17 removed trigraphs. IBM heavily opposed this (here and here) so there seem to be arguments for both sides of removal/non removal.
But since the decision was made to remove trigraphs, why leave digraphs? I don't see any reasons for keeping digraphs beyond the reasons to keep trigraphs (which apparently didn't weight enough to keep them).
Trigraph sequences allow C programs to be written using only the ISO (International Standards Organization) Invariant Code Set. Trigraphs are sequences of three characters (introduced by two consecutive question marks) that the compiler replaces with their corresponding punctuation characters.
Various reasons exist for using digraphs and trigraphs: keyboards may not have keys to cover the entire character set of the language, input of special characters may be difficult, text editors may reserve some characters for special use and so on.
C Language Multi-Character Character Sequence Digraphs These use only two characters and are known as digraphs. Unlike trigraphs, digraphs are tokens. If a digraph occurs in another token (e.g. string literals or character constants) then it will not be treated as a digraph, but remain as it is.
A trigraph is a three-character sequence that represents a single character. The sequence always starts with two question marks.
Trigraphs are more problematic to the unaware user than digraphs. This is because they are replaced within string literals and comments. Here are some examples…
Example A:
std::string example = "What??!??!"; std::cout << example << std::endl;
What||
will be printed to the console. This is because of the trigraph ??!
being translated to |
.
Example B:
// Error ?!?!?!??!??/ std::cout << "There was an error!" << std::endl;
Nothing will happen at all. This is because ??/
translates to \
, which escapes the newline character and results in the next line being commented out.
Example C:
// This makes no sense ?!?!!?!??!??/ std::string example = "Hello World"; std::cout << example << std::endl;
This will give an error along the lines of use of undeclared identifier "example"
for the same reasons as Example B.
There are far more elaborate problems trigraphs can cause too, but you get the idea. It's worth noting that many compilers actually emit a warning when such translations are being made; yet another reason to always treat warnings as errors. However this is not required by the standard and therefore cannot be relied upon.
Digraphs are much less problematic than trigraphs, as they are not replaced inside another token (i.e. a string or character literal) and there is not a sequence that translates to \
, so escaping new lines in comments cannot occur.
Conclusion
Other than harder to read code, there are less problems caused by digraphs and therefore the need to remove them is greatly reduced.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With