Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to keep some matches, remove others

Tags:

regex

r

I'm having a trouble with this regular expression. Consider the following vector.

> vec <- c("new jersey", "south dakota", "virginia:chincoteague",
           "washington:whidbey island", "new york:main")

Of those strings that contain a :, I would like to keep only the ones with main after :, resulting in

[1] "new jersey" "south dakota" "new york:main"

So far, I've only been able to get there with this ugly nested nightmare, which is quite obviously far from optimal.

> g1 <- grep(":", vec)
> vec[ -g1[grep("main", grep(":", vec, value = TRUE), invert = TRUE)] ]
# [1] "new jersey"    "south dakota"  "new york:main"

How can I write a single regular expression to keep :main but remove others containing : ?

like image 395
Rich Scriven Avatar asked Jan 11 '23 08:01

Rich Scriven


1 Answers

Using | (Pick one that contains :main or that does not contains : at all):

> vec <- c("new jersey", "south dakota", "virginia:chincoteague",
+            "washington:whidbey island", "new york:main")
> grep(":main|^[^:]*$", vec)
[1] 1 2 5
> vec[grep(":main|^[^:]*$", vec)]
[1] "new jersey"    "south dakota"  "new york:main"
like image 153
falsetru Avatar answered Jan 18 '23 16:01

falsetru