To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
Use the new stringr package which wraps all the existing regular expression operates in a consistent syntax and adds a few that are missing:
library(stringr)
str_locate("aaa12xxx", "[0-9]+")
# start end
# [1,] 4 5
str_extract("aaa12xxx", "[0-9]+")
# [1] "12"
It is probably a bit hasty to say 'ignore the standard functions' - the help file for ?gsub
even specifically references in 'See also':
‘regmatches’ for extracting matched substrings based on the results of ‘regexpr’, ‘gregexpr’ and ‘regexec’.
So this will work, and is fairly simple:
txt <- "aaa12xxx"
regmatches(txt,regexpr("[0-9]+",txt))
#[1] "12"
For your specific case you could remove all not numbers:
gsub("[^0-9]", "", "aaa12xxxx")
# [1] "12"
It won't work in more complex cases
gsub("[^0-9]", "", "aaa12xxxx34")
# [1] "1234"
You can use PERL regexs' lazy matching:
> sub(".*?([0-9]+).*", "\\1", "aaa12xx99",perl=TRUE)
[1] "12"
Trying to substitute out non-digits will lead to an error in this case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With