I'm starting to learn C#, and I don't understand why regular string literals (i.e., " "
) cannot contain literal newline characters. (I'm not talking about the escape sequence \n
). I know that you must use verbatim string literals (i.e., @" "
) for multiline strings, but why?
I've not seen it explicitly stated that you cannot use them in regular strings. More than that, except where it's mentioned in passing that I can use verbatim strings for this, everything I've read seems to suggest that literal newline characters would be allowed in regular string literals.
Beginning Visual C# 2010 and Code: Generating Multiline String Literals (Visual C#) show examples of verbatim multiline strings with no further explanation.
Learning C# 3.0 says this:
In the C# language, spaces, tabs, and newlines are considered to be whitespace.... Extra whitespace is generally ignored in C# statements. ... The exception to this rule is that whitespace within a string is treated as literal; it is not ignored.
So it's literal? That's what I would expect too, but it's not.
It even includes this tip box:
Tip
Visual Basic programmers take note: in C#, the end-of-line has no special significance. Statements are ended with semicolons, not newline characters. There is no line continuation character because none is needed.
(I realize that this is talking about outside of strings, but why would end-of-line have special parsing significance inside a string if it doesn't outside a string?)
Having finally found my way to the string (C# Reference) itself, I still garnered no insight:
String literals can contain any character literal. Escape sequences are included. The following example uses escape sequence
\\
for backslash,\u0066
for the letter f, and\n
for newline.
It says that escape sequences can be used, but it does not say they must be used. Are literal newline characters not included in "any character literal"? If I have a string that contains a literal tab character instead of its escape sequence \t
, there is no error. But if I have a literal newline, I get an error. I've even changed the file's line endings from \r\n
to \n
or \r
to no effect.
Obviously, I'm able to infer from examples and from Visual Studio errors that a verbatim string is required if it contains a literal newline character, but everything I've read suggests that shouldn't be the case. Why the difference?
Well, shoot. Right as I was submitting this, I found the answer.
Are literal newline characters not included in "any character literal"?
Apparently, no, they aren't.
2.4.4.4 Character literals:
character-literal:
' character '
character:
single-character
single-character:
Any character except ' (U+0027), \ (U+005C), and new-line-character
Likely dupe of Why must C/C++ string literal declarations be single-line?
In a nutshell, because the C language doesn't support it.
A typo that leaves a string literal unclosed would slurp the rest of the file as a single token, leaving the programmer with a compiler error message along the lines of "expecting a semi-colon at line xxx, column yyy" where the indicated location is the end of the source file.
Mostly you don't use multi-line literals. Better to make them explicit from a UX perspective.
Further, in the constrained environment the C language was developed in (8K PDP-11?), I suspect that sort of overflow might crash the compiler.
The C language does support literal splicing, though, which is helpful:
char *txt = "this is line 1\n"
"this is line 2\n"
"this is line 3\n"
;
It also supports line splicing:
char *txt = "this is my\n\
multi-line string literal\n\
isn't it nice?\n" ;
Features that I wish C# had.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With