i have a string a
like this one:
stundenwerte_FF_00691_19260101_20131231_hist.zip
and would like to extract the 5-digit number "00691" from it.
I tried using gregexpr
and regmatches
as well as stringr::str_extract
but couldn't figute out the right rexexp.
I came as far as:
gregexpr("[:digits{5}:]",a)
Which should return 5-digit-numbers and i dont understand how to fix it.
This does not work propperly :(
m <- gregexpr("[:digits{5}:]",a)
regmatches(a,m)
Thanks for your help in advance!
You could simply use sub
to grab the digits, IMO regmatches
is not necessary for this simple case.
x <- 'stundenwerte_FF_00691_19260101_20131231_hist.zip'
sub('\\D*(\\d{5}).*', '\\1', x)
# [1] "00691"
Edit: If you have other strings that contain digits in front, you would slightly modify the expression.
sub('.*_(\\d{5})_.*', '\\1', x)
1) sub
sub(".*_(\\d{5})_.*", "\\1", x)
## [1] "00691"
2) gsubfn::strapplyc The regexp can be slightly simplified if we use strapplyc
:
library(gsubfn)
strapplyc(x, "_(\\d{5})_", simplify = TRUE)
## [1] "00691"
3) strsplit If we know that it is the third field:
read.table(text = x, sep = "_", colClasses = "character")$V3
## [1] "00691"
3a) or
strsplit(x, "_")[[1]][3]
## [1] "00691"
You could try the below regex which uses negative lookaround assertions. We can't use word boundaries here like \\b\\d{5}\\b
because the preceding and the following character _
comes under \w
> x <- "stundenwerte_FF_00691_19260101_20131231_hist.zip"
> m <- regexpr("(?<!\\d)\\d{5}(?!\\d)", x, perl=TRUE)
> regmatches(x, m)
[1] "00691"
> m <- gregexpr("(?<!\\d)\\d{5}(?!\\d)", x, perl=TRUE)
> regmatches(x, m)[[1]]
[1] "00691"
Explanation:
(?<!\\d)
Negative lookbehind asserts that what precedes the match would be any but not a digit.\\d{5}
Match exactly 5 digits.(?!\\d)
Negative lookahead asserts that the character following the match would be any but not a digit. Let string be:
ss ="stundenwerte_FF_00691_19260101_20131231_hist.zip"
You can split the string and unlist the substrings:
ll = unlist(strsplit(ss,'_'))
Then get indexes of substrings set to TRUE if they are 5 characters long:
idx = sapply(ll, nchar)==5
And get the ones which are 5 characters long:
ll[idx]
[1] "00691"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With