Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there raw strings in R for regular expressions?

Tags:

regex

r

In Python you can use raw strings:

import re
re.sub(r"\\", ":", "back\\slash")  # r"\\" instead of "\\\\"

Does this exist in R as well? For example, here is an equivalent code snippet without raw strings in R:

library(stringr)
str_replace("back\\slash", "\\\\", ":")

I would love to be able to do this:

str_replace("back\\slash", raw("\\"), ":")

Does this functionality already exist, or should I just implement my own function raw()?

like image 804
Megatron Avatar asked Feb 28 '16 22:02

Megatron


People also ask

What is a raw string regex?

According to Python docs, raw string notation (r"text") keeps regular expressions meaningful and confusion-free. Without it, every backslash ('\') in a regular expression would have to be prefixed with another one to escape it. For example, the two following lines of code are functionally identical − >>> re.

Why are raw strings useful for regular expressions?

Regular Expressions usually contain a lot of backslashes(\). When using Python's “re” module , regular expressions are represented as strings. So, like all strings with a lot of backslashes, they are more readable when written in raw literal form. Raw Strings are amazing for regex.

What is a raw string?

A raw string in programming allows all characters in a string literal to remain the same in code and in the material, rather than performing their standard programming functions. Raw strings are denoted with the letter r, or capital R, and might look something like this: R “(hello)”

Does R use regex?

Two types of regular expressions are used in R, extended regular expressions (the default) and Perl-like regular expressions used by perl = TRUE . There is also fixed = TRUE which can be considered to use a literal regular expression.


2 Answers

For your example a R < 4.0.0 alternative would be the other way round to use a function that automatically escapes special characters. Here the stringi package is helpful.

library(stringr)
library(stringi)
str_replace("back\\slash", stri_escape_unicode("\\"), ":")

Since this is very verbose defining r <- stri_escape_unicode would come close to your desired functionality (r("\\")).

The stringi package also has a function to reverse escaping stri_unescape_unicode which is useful in a shiny apps, where user inputs are automatically escaped.

like image 37
TimTeaFan Avatar answered Oct 25 '22 22:10

TimTeaFan


As of R 4.0.0 this is available.

Raw character constants are also available using a syntax similar to the one used in C++: r"(...)" with ... any character sequence, except that it must not contain the closing sequence )". The delimiter pairs [] and {} [c]an also be used. For additional flexibility, a number of dashes can be placed between the opening quote and the opening delimiter, as long as the same number of dashes appear between the closing delimiter and the closing quote.

From the example in ?Quotes:

r"{(\1\2)}"
## [1] "(\\1\\2)"

(note, the double-backslashes are R's printed representation of backslashes: cat() on this object will print (\1\2))

like image 123
Ben Bolker Avatar answered Oct 25 '22 20:10

Ben Bolker