Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting substrings beginning with XX.XXXX

Tags:

regex

r

I have a string

x <- "24.3483 stuff stuff 34.8325 some more stuff"

The [0-9]{2}\\.[0-9]{4} is what denotes the beginning of each part of each substring I would like to extract. For the above example, I would like the output to be equivalent to

[1] "24.3483 stuff stuff"     "34.8325 some more stuff"

I've already looked at R split on delimiter (split) keep the delimiter (split):

> unlist(strsplit(x, "(?<=[[0-9]{2}\\.[0-9]{4}])", perl=TRUE))
[1] "24.3483 stuff stuff 34.8325 some more stuff"

which isn't what I want, as well as How should I split and retain elements using strsplit?.

like image 530
Clarinetist Avatar asked Jan 22 '26 20:01

Clarinetist


2 Answers

You may use

x <- "24.3483 stuff stuff 34.8325 some more stuff"
unlist(strsplit(x, "\\s+(?=[0-9]{2}\\.[0-9]{4})", perl=TRUE))
[1] "24.3483 stuff stuff"     "34.8325 some more stuff"

See the regex demo and the R demo.

Details

  • \s+ - 1+ whitespaces (this should prevent a match at the start of the string, you may replace it with \\s*\\b if the matches can have no whitespaces before)
  • (?=[0-9]{2}\.[0-9]{4}) - a positive lookahead that requires (does not consume the text!) 2 digits, ., and 4 digits immediately to the right of the current location.
like image 114
Wiktor Stribiżew Avatar answered Jan 25 '26 16:01

Wiktor Stribiżew


If you're sure there won't be digits in the intervening text ...

stringr::str_extract_all(x, "[0-9]{2}\\.[0-9]{4}[^0-9]+")

(this includes an extra space, you could use trimws())

Alternatively you can use stringr::str_locate_all() to find starting positions. It's a little clunky but ...

pos <- stringr::str_locate_all(x, "[0-9]{2}\\.[0-9]{4}")[[1]][,"start"]
pos <- c(pos,nchar(x)+1)
Map(substr,pos[-length(pos)],pos[-1]-1,x=x)
like image 37
Ben Bolker Avatar answered Jan 25 '26 15:01

Ben Bolker



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!