Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prevent grep in R from treating "." as a letter

Tags:

regex

r

I have a character vector that contains text similar to the following:

text <- c("ABc.def.xYz", "ge", "lmo.qrstu")

I would like to remove everything before a .:

> "xYz" "ge" "qrstu"

However, the grep function seems to be treating . as a letter:

pattern <- "([A-Z]|[a-z])+$"

grep(pattern, text, value = T)

> "ABc.def.xYz" "ge"          "lmo.qrstu" 

The pattern works elsewhere, such as on regexpal.

How can I get grep to behave as expected?

like image 370
sdgfsdh Avatar asked Dec 03 '22 16:12

sdgfsdh


1 Answers

grep is for finding the pattern. It returns the index of the vector that matches a pattern. If, value=TRUE is specified, it returns the value. From the description, it seems that you want to remove substring instead of returning a subset of the initial vector.

If you need to remove the substring, you can use sub

 sub('.*\\.', '', text)
 #[1] "xYz"   "ge"    "qrstu"

As the first argument, we match a pattern i.e. '.*\\.'. It matches one of more characters (.*) followed by a dot (\\.). The \\ is needed to escape the . to treat it as that symbol instead of any character. This will match until the last . character in the string. We replace that matched pattern with a '' as the replacement argument and thereby remove the substring.

like image 61
akrun Avatar answered Dec 22 '22 01:12

akrun