Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Escaped Periods In R Regular Expressions

Tags:

regex

r

Unless I am missing something, this regex seems pretty straightforward:

grepl("Processor\.[0-9]+\..*Processor\.Time", names(web02)) 

However, it doesn't like the escaped periods, \. for which my intent is to be a literal period:

Error: '\.' is an unrecognized escape in character string starting "Processor\." 

What am I misunderstanding about this regex syntax?

like image 250
Kyle Brandt Avatar asked Jul 09 '11 23:07

Kyle Brandt


People also ask

How do you escape a period in regex?

(dot) metacharacter, and can match any single character (letter, digit, whitespace, everything). You may notice that this actually overrides the matching of the period character, so in order to specifically match a period, you need to escape the dot by using a slash \.

What are escaped characters in regex?

The \ is known as the escape code, which restore the original literal meaning of the following character. Similarly, * , + , ? (occurrence indicators), ^ , $ (position anchors) have special meaning in regex. You need to use an escape code to match with these characters.

What does escape do in regex?

Escape converts a string so that the regular expression engine will interpret any metacharacters that it may contain as character literals.

Do dashes need to be escaped in regex?

You only need to escape the dash character if it could otherwise be interpreted as a range indicator (which can be the case inside a character class). Save this answer.


2 Answers

My R-Fu is weak to the point of being non-existent but I think I know what's up.

The string handling part of the R processor has to peek inside the strings to convert \n and related escape sequences into their character equivalents. R doesn't know what \. means so it complains. You want to get the escaped dot down into the regex engine so you need to get a single \ past the string mangler. The usual way of doing that sort of thing is to escape the escape:

grepl("Processor\\.[0-9]+\\..*Processor\\.Time", names(web02)) 

Embedding one language (regular expressions) inside another language (R) is usually a bit messy and more so when both languages use the same escaping syntax.

like image 70
mu is too short Avatar answered Oct 14 '22 11:10

mu is too short


Instead of

\. 

Try

\\. 

You need to escape the backspace first.

like image 33
Cameron Avatar answered Oct 14 '22 12:10

Cameron