I match and replace 4-digit numbers preceded and followed by white space with:
str12 <- "coihr 1234 &/()= jngm 34 ljd"
sub("\\s\\d{4}\\s", "", str12)
[1] "coihr&/()= jngm 34 ljd"
but, every try to invert this and extract the number instead fails. I want:
[1] 1234
does someone has a clue?
ps: I know how to do it with {stringr} but am wondering if it's possible with {base} only..
require(stringr)
gsub("\\s", "", str_extract(str12, "\\s\\d{4}\\s"))
[1] "1234"
Add the $ anchor. /^SW\d{4}$/ . It's because of the \w+ where \w+ match one or more alphanumeric characters. \w+ matches digits as well.
To match any number from 0 to 9 we use \d in regex. It will match any single digit number from 0 to 9. \d means [0-9] or match any number from 0 to 9. Instead of writing 0123456789 the shorthand version is [0-9] where [] is used for character range.
Alternation is the term in regular expression that is actually a simple “OR”. In a regular expression it is denoted with a vertical line character | . For instance, we need to find programming languages: HTML, PHP, Java or JavaScript.
Short for regular expression, a regex is a string of text that lets you create patterns that help match, locate, and manage text. Perl is a great example of a programming language that utilizes regular expressions. However, its only one of the many places you can find regular expressions.
regmatches(), only available since R-2.14.0, allows you to "extract or replace matched substrings from match data obtained by regexpr, gregexpr or regexec"
Here are examples of how you could use regmatches() to extract either the first whitespace-cushioned 4-digit substring in your input character string, or all such substrings.
## Example strings and pattern
x <- "coihr 1234 &/()= jngm 34 ljd" # string with 1 matching substring
xx <- "coihr 1234 &/()= jngm 3444 6789 ljd" # string with >1 matching substring
pat <- "(?<=\\s)(\\d{4})(?=\\s)"
## Use regexpr() to extract *1st* matching substring
as.numeric(regmatches(x, regexpr(pat, x, perl=TRUE)))
# [1] 1234
as.numeric(regmatches(xx, regexpr(pat, xx, perl=TRUE)))
# [1] 1234
## Use gregexpr() to extract *all* matching substrings
as.numeric(regmatches(xx, gregexpr(pat, xx, perl=TRUE))[[1]])
# [1] 1234 3444 6789
(Note that this will return numeric(0) for character strings not containing a substring matching your criteria).
It's possible to capture group in regex using (). Taking the same example
str12 <- "coihr 1234 &/()= jngm 34 ljd"
gsub(".*\\s(\\d{4})\\s.*", "\\1", str12)
[1] "1234"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With