I have a string
x <- "24.3483 stuff stuff 34.8325 some more stuff"
The [0-9]{2}\\.[0-9]{4} is what denotes the beginning of each part of each substring I would like to extract. For the above example, I would like the output to be equivalent to
[1] "24.3483 stuff stuff" "34.8325 some more stuff"
I've already looked at R split on delimiter (split) keep the delimiter (split):
> unlist(strsplit(x, "(?<=[[0-9]{2}\\.[0-9]{4}])", perl=TRUE))
[1] "24.3483 stuff stuff 34.8325 some more stuff"
which isn't what I want, as well as How should I split and retain elements using strsplit?.
You may use
x <- "24.3483 stuff stuff 34.8325 some more stuff"
unlist(strsplit(x, "\\s+(?=[0-9]{2}\\.[0-9]{4})", perl=TRUE))
[1] "24.3483 stuff stuff" "34.8325 some more stuff"
See the regex demo and the R demo.
Details
\s+ - 1+ whitespaces (this should prevent a match at the start of the string, you may replace it with \\s*\\b if the matches can have no whitespaces before)(?=[0-9]{2}\.[0-9]{4}) - a positive lookahead that requires (does not consume the text!) 2 digits, ., and 4 digits immediately to the right of the current location.If you're sure there won't be digits in the intervening text ...
stringr::str_extract_all(x, "[0-9]{2}\\.[0-9]{4}[^0-9]+")
(this includes an extra space, you could use trimws())
Alternatively you can use stringr::str_locate_all() to find starting positions. It's a little clunky but ...
pos <- stringr::str_locate_all(x, "[0-9]{2}\\.[0-9]{4}")[[1]][,"start"]
pos <- c(pos,nchar(x)+1)
Map(substr,pos[-length(pos)],pos[-1]-1,x=x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With