I'm learning about raw strings in C++ from a cplusplus.com tutorial on constants. Based on the definition on that site, a raw string should start with R"sequence(
and end with )sequence
where sequence
can be any sequence of characters.
One of the examples of the website is the following:
R"&%$(string with \backslash)&%$"
However, when I try to compile the code that contains the above raw string, I get a compilation error.
test.cpp:5:28: error: invalid character '$' in raw string delimiter
5 | std::string str = R"&%$(string with \backslash)&%$";
| ^
test.cpp:5:23: error: stray 'R' in program
I tried it with g++ and clang++ on both Windows and Linux. None of them worked.
Python raw string is created by prefixing a string literal with 'r' or 'R'. Python raw string treats backslash (\) as a literal character. This is useful when we want to have a string that contains backslash and don't want it to be treated as an escape character.
A rawstring is a string literal (prefixed with an r) in which the normal escaping rules have been suspended so that everything is a literal.
A raw string in programming allows all characters in a string literal to remain the same in code and in the material, rather than performing their standard programming functions. Raw strings are denoted with the letter r, or capital R, and might look something like this: R “(hello)”
raw strings are raw string literals that treat backslash (\ ) as a literal character. For example, if we try to print a string with a “\n” inside, it will add one line break. But if we mark it as a raw string, it will simply print out the “\n” as a normal character.
From C++ reference:
delimiter: A character sequence made of any source character but parentheses, backslash and spaces (can be empty, and at most 16 characters long)
Note the "any source character" part here.
Let us look at what the standard says:
From [gram.lex]:
raw-string:
"d-char-sequenceopt(r-char-sequenceopt)d-char-sequenceopt"...
d-char-sequence:
d-char
d-char-sequence d-chard-char:
any member of the basic source character set except: space, the left parenthesis(
, the right parenthesis)
, the backslash\
, and the control characters representing horizontal tab, vertical tab, form feed, and newline.
Well, what is the basic source character set? From [lex.charset]:
The basic source character set consists of 96 characters: the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:
a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & |~! = , \ " ’
... which does not include $
; so the conclusion is that the dollar sign $
cannot be part of the delimiter sequence.
For the basic source character set, see lex.charset 5.3 (1): that set does not contain the $
character. For the allowed prefix characters in raw string literals, see lex.string 5.13.5: "/…/ any member of the basic source character set except: space, the left parenthesis (
, the right parenthesis )
, the backslash \
, and the control characters representing horizontal tab, vertical tab, form feed, and newline." (emphasis mine).
Just remove $
like the code below :
string string3 = R"&%(string with \backslash)&%";
$
gives error because the basic source character set does not have $
as said in the comments.
- The individual bytes of the source code file are mapped (in implementation-defined manner) to the characters of the basic source character set. In particular, OS-dependent end-of-line indicators are replaced by newline characters. The basic source character set consists of 96 characters:
a) 5 whitespace characters (space, horizontal tab, vertical tab, form feed, new-line)
b) 10 digit characters from '0' to '9'
c) 52 letters from 'a' to 'z' and from 'A' to 'Z'
d) 29 punctuation characters: _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " ' 2) Any source file character that cannot be mapped to a character in the basic source character set is replaced by its universal character name (escaped with \u or \U) or by some implementation-defined form that is handled equivalently.
Ref : Click here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With