Questions Linux Laravel Mysql Ubuntu Git Menu

HTML CSS JAVASCRIPT SQL PYTHON PHP BOOTSTRAP JAVA JQUERY R React Kotlin

Extracting substrings beginning with XX.XXXX

Tags:

regex

r

I have a string

x <- "24.3483 stuff stuff 34.8325 some more stuff"

The [0-9]{2}\\.[0-9]{4} is what denotes the beginning of each part of each substring I would like to extract. For the above example, I would like the output to be equivalent to

[1] "24.3483 stuff stuff"     "34.8325 some more stuff"

I've already looked at R split on delimiter (split) keep the delimiter (split):

> unlist(strsplit(x, "(?<=[[0-9]{2}\\.[0-9]{4}])", perl=TRUE))
[1] "24.3483 stuff stuff 34.8325 some more stuff"

which isn't what I want, as well as How should I split and retain elements using strsplit?.

like image

530

asked Jan 22 '26 20:01

Clarinetist

2 Answers

You may use

x <- "24.3483 stuff stuff 34.8325 some more stuff"
unlist(strsplit(x, "\\s+(?=[0-9]{2}\\.[0-9]{4})", perl=TRUE))
[1] "24.3483 stuff stuff"     "34.8325 some more stuff"

See the regex demo and the R demo.

Details

\s+ - 1+ whitespaces (this should prevent a match at the start of the string, you may replace it with \\s*\\b if the matches can have no whitespaces before)
(?=[0-9]{2}\.[0-9]{4}) - a positive lookahead that requires (does not consume the text!) 2 digits, ., and 4 digits immediately to the right of the current location.

like image

114

answered Jan 25 '26 16:01

Wiktor Stribiżew

If you're sure there won't be digits in the intervening text ...

stringr::str_extract_all(x, "[0-9]{2}\\.[0-9]{4}[^0-9]+")

(this includes an extra space, you could use trimws())

Alternatively you can use stringr::str_locate_all() to find starting positions. It's a little clunky but ...

pos <- stringr::str_locate_all(x, "[0-9]{2}\\.[0-9]{4}")[[1]][,"start"]
pos <- c(pos,nchar(x)+1)
Map(substr,pos[-length(pos)],pos[-1]-1,x=x)

like image

37

answered Jan 25 '26 15:01

Ben Bolker

Sign in to Comment

Related questions
                            
                                How to subset sequences in fasta file based on sequence ID or Name?
                            
                                Easily print a nicely looking tree diagram from data in R
                            
                                Inline expansion of variables in R
                            
                                R: How to identify the first occurrence of a specific value of a variable grouped by ID
                            
                                R: formatting the digits in xtable
                            
                                Problems reading JSON file in R
                            
                                Summary table by group in R
                            
                                Sending complex messages to discord bot through R
                            
                                How to reorder cluster leaves (columns) when plotting pheatmap in R?
                            
                                How to extract the minute of the day with R?
                            
                                How can I dodge / combine both geom_point and geom_boxplot with two factors
                            
                                how to automatically align the axis limits for different ggplots
                            
                                Tidymodels: How to extract importance from training data
                            
                                Return words with three consecutive double letters (e.g. bookkeeper) in R
                            
                                R xgboost importance plot with many features

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With