Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex matching everything that's not a 4 digit number

Tags:

regex

r

I match and replace 4-digit numbers preceded and followed by white space with:

str12 <- "coihr 1234 &/()= jngm 34 ljd"
sub("\\s\\d{4}\\s", "", str12)
[1] "coihr&/()= jngm 34 ljd"

but, every try to invert this and extract the number instead fails. I want:

[1] 1234

does someone has a clue?

ps: I know how to do it with {stringr} but am wondering if it's possible with {base} only..

require(stringr)
gsub("\\s", "", str_extract(str12, "\\s\\d{4}\\s"))
[1] "1234"
like image 647
Kay Avatar asked Aug 24 '12 19:08

Kay


People also ask

How does regex match 4 digits?

Add the $ anchor. /^SW\d{4}$/ . It's because of the \w+ where \w+ match one or more alphanumeric characters. \w+ matches digits as well.

How do you match a regular expression with digits?

To match any number from 0 to 9 we use \d in regex. It will match any single digit number from 0 to 9. \d means [0-9] or match any number from 0 to 9. Instead of writing 0123456789 the shorthand version is [0-9] where [] is used for character range.

Can you use or in regex?

Alternation is the term in regular expression that is actually a simple “OR”. In a regular expression it is denoted with a vertical line character | . For instance, we need to find programming languages: HTML, PHP, Java or JavaScript.

What does this regex do?

Short for regular expression, a regex is a string of text that lets you create patterns that help match, locate, and manage text. Perl is a great example of a programming language that utilizes regular expressions. However, its only one of the many places you can find regular expressions.


2 Answers

regmatches(), only available since R-2.14.0, allows you to "extract or replace matched substrings from match data obtained by regexpr, gregexpr or regexec"

Here are examples of how you could use regmatches() to extract either the first whitespace-cushioned 4-digit substring in your input character string, or all such substrings.

## Example strings and pattern
x <- "coihr 1234 &/()= jngm 34 ljd"          # string with 1 matching substring
xx <- "coihr 1234 &/()= jngm 3444  6789 ljd" # string with >1 matching substring
pat <- "(?<=\\s)(\\d{4})(?=\\s)"

## Use regexpr() to extract *1st* matching substring
as.numeric(regmatches(x, regexpr(pat, x, perl=TRUE)))
# [1] 1234
as.numeric(regmatches(xx, regexpr(pat, xx, perl=TRUE)))
# [1] 1234


## Use gregexpr() to extract *all* matching substrings
as.numeric(regmatches(xx, gregexpr(pat, xx, perl=TRUE))[[1]])
# [1] 1234 3444 6789

(Note that this will return numeric(0) for character strings not containing a substring matching your criteria).

like image 50
Josh O'Brien Avatar answered Nov 15 '22 15:11

Josh O'Brien


It's possible to capture group in regex using (). Taking the same example

str12 <- "coihr 1234 &/()= jngm 34 ljd"
gsub(".*\\s(\\d{4})\\s.*", "\\1", str12)
[1] "1234"
like image 37
dickoa Avatar answered Nov 15 '22 16:11

dickoa