Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract a regular expression match

Tags:

regex

r

People also ask

How do you match a regular expression?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).


Use the new stringr package which wraps all the existing regular expression operates in a consistent syntax and adds a few that are missing:

library(stringr)
str_locate("aaa12xxx", "[0-9]+")
#      start end
# [1,]     4   5
str_extract("aaa12xxx", "[0-9]+")
# [1] "12"

It is probably a bit hasty to say 'ignore the standard functions' - the help file for ?gsub even specifically references in 'See also':

‘regmatches’ for extracting matched substrings based on the results of ‘regexpr’, ‘gregexpr’ and ‘regexec’.

So this will work, and is fairly simple:

txt <- "aaa12xxx"
regmatches(txt,regexpr("[0-9]+",txt))
#[1] "12"

For your specific case you could remove all not numbers:

gsub("[^0-9]", "", "aaa12xxxx")
# [1] "12"

It won't work in more complex cases

gsub("[^0-9]", "", "aaa12xxxx34")
# [1] "1234"

You can use PERL regexs' lazy matching:

> sub(".*?([0-9]+).*", "\\1", "aaa12xx99",perl=TRUE)
[1] "12"

Trying to substitute out non-digits will lead to an error in this case.