I have a character vector like the following:
char <- c("cancer_6_53_7575_tumor.csv", "control_7_4_7363_healthy.csv")
I want to extract the portion of the string starting with the "7" in the 4 digit patient ID and ending with the ".", but the following method doesn't work when there is a 7 before that patient ID.
values <- unlist(qdapRegex::rm_between(char, "7", ".", extract = TRUE))
How do I specify that it must start with the 7 in the 4 digit number?
You can use this:
char <- c("cancer_6_53_7575_tumor.csv", "control_7_4_7363_healthy.csv")
gsub(".*(7\\d{3}.*)\\..*$", "\\1", char)
[1] "7575_tumor" "7363_healthy"
7\\d{3}
.
: (7\\d{3}.*)\\.
\\1
Another way is to use stringr
.
library(stringr)
str_extract(char, '7\\d{3}[^\\.]*')
## [1] "7575_tumor" "7363_healthy"
It will match 4 numbers starting with 7
and everything until the dot - .
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With