Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the proper format of writing raw strings with '$' in C++?

Tags:

c++

string

I'm learning about raw strings in C++ from a cplusplus.com tutorial on constants. Based on the definition on that site, a raw string should start with R"sequence( and end with )sequence where sequence can be any sequence of characters.

One of the examples of the website is the following:

R"&%$(string with \backslash)&%$"

However, when I try to compile the code that contains the above raw string, I get a compilation error.

test.cpp:5:28: error: invalid character '$' in raw string delimiter
    5 |     std::string str = R"&%$(string with \backslash)&%$";
      |                       ^
test.cpp:5:23: error: stray 'R' in program

I tried it with g++ and clang++ on both Windows and Linux. None of them worked.

like image 388
Amirreza A. Avatar asked Feb 27 '21 16:02

Amirreza A.


People also ask

How do you write raw strings?

Python raw string is created by prefixing a string literal with 'r' or 'R'. Python raw string treats backslash (\) as a literal character. This is useful when we want to have a string that contains backslash and don't want it to be treated as an escape character.

What is a raw string in C?

A rawstring is a string literal (prefixed with an r) in which the normal escaping rules have been suspended so that everything is a literal.

What is a raw string?

A raw string in programming allows all characters in a string literal to remain the same in code and in the material, rather than performing their standard programming functions. Raw strings are denoted with the letter r, or capital R, and might look something like this: R “(hello)”

What is raw string give an example?

raw strings are raw string literals that treat backslash (\ ) as a literal character. For example, if we try to print a string with a “\n” inside, it will add one line break. But if we mark it as a raw string, it will simply print out the “\n” as a normal character.


3 Answers

From C++ reference:

delimiter: A character sequence made of any source character but parentheses, backslash and spaces (can be empty, and at most 16 characters long)

Note the "any source character" part here.

Let us look at what the standard says:

From [gram.lex]:

raw-string:
  "d-char-sequenceopt(r-char-sequenceopt)d-char-sequenceopt"

...

d-char-sequence:
  d-char
  d-char-sequence d-char

d-char:
  any member of the basic source character set except: space, the left parenthesis (, the right parenthesis ), the backslash \, and the control characters representing horizontal tab, vertical tab, form feed, and newline.

Well, what is the basic source character set? From [lex.charset]:

The basic source character set consists of 96 characters: the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:

a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & |~! = , \ " ’

... which does not include $; so the conclusion is that the dollar sign $ cannot be part of the delimiter sequence.

like image 148
ph3rin Avatar answered Oct 18 '22 03:10

ph3rin


For the basic source character set, see lex.charset 5.3 (1): that set does not contain the $ character. For the allowed prefix characters in raw string literals, see lex.string 5.13.5: "/…/ any member of the basic source character set except: space, the left parenthesis (, the right parenthesis ), the backslash \, and the control characters representing horizontal tab, vertical tab, form feed, and newline." (emphasis mine).

like image 33
heap underrun Avatar answered Oct 18 '22 03:10

heap underrun


Just remove $ like the code below :

string string3 = R"&%(string with \backslash)&%";

$ gives error because the basic source character set does not have $ as said in the comments.

  1. The individual bytes of the source code file are mapped (in implementation-defined manner) to the characters of the basic source character set. In particular, OS-dependent end-of-line indicators are replaced by newline characters. The basic source character set consists of 96 characters:

a) 5 whitespace characters (space, horizontal tab, vertical tab, form feed, new-line)

b) 10 digit characters from '0' to '9'

c) 52 letters from 'a' to 'z' and from 'A' to 'Z'

d) 29 punctuation characters: _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " ' 2) Any source file character that cannot be mapped to a character in the basic source character set is replaced by its universal character name (escaped with \u or \U) or by some implementation-defined form that is handled equivalently.

Ref : Click here

like image 1
Rohith V Avatar answered Oct 18 '22 02:10

Rohith V