I have string inputs in the following format:
my.strings <- c("FACT11", "FACT11:FACT20", "FACT1sometext:FACT20", "FACT1text with spaces:FACT20", "FACT14:FACT20", "FACT1textAnd1312:FACT2etc", "FACT12:FACT22:FACT31")
I would like to extract all the "FACT"s and the first number following FACT. So the result from this example would be:
c("FACT1", "FACT1 FACT2", "FACT1 FACT2", "FACT1 FACT2", "FACT1 FACT2", "FACT1 FACT2", "FACT1 FACT2 FACT3")
Alternatively, the result could be a list, where each element of the list is a vector with 1 up to 3 items.
What I got so far is:
gsub("(FACT[1-3]).*?:(FACT[1-3]).*", '\\1 \\2', my.strings)
# [1] "FACT11" "FACT1 FACT2 " "FACT1 FACT2 " "FACT1 FACT2 " "FACT1 FACT2 " "FACT1 FACT2 "
# [7] "FACT1 FACT2 " "FACT1 FACT2 "
It kinda looks good, except for the "FACT11" for the first element instead of "FACT1" (dropping the second "1"), and missing the "FACT3" for the last element of my.strings
. But adding another group to gsub
somehow messes the whole thing up.
gsub("(FACT[1-3]).*?:(FACT[1-3]).*?:(FACT[1-3]).*?", '\\1 \\2 \\3', my.strings)
# [1] "FACT11" "FACT11:FACT20" "FACT1sometext:FACT20"
# [4] "FACT1text with spaces:FACT20" "FACT14:FACT20" "FACT1textAnd1312:FACT2etc"
# [7] "FACT12:FACT21" "FACT1 FACT2 FACT31"
So how can I properly extract the groups?
You may use a base R approach, too:
> m <- regmatches(my.strings, gregexpr("FACT[1-3]", my.strings))
> sapply(m, paste, collapse=" ")
[1] "FACT1"
[2] "FACT1 FACT2"
[3] "FACT1 FACT2"
[4] "FACT1 FACT2"
[5] "FACT1 FACT2"
[6] "FACT1 FACT2"
[7] "FACT1 FACT2 FACT3"
Extract all matches with your FACT[1-3]
(or FACT[0-9]
, or FACT\\d
) pattern, and then "join" them with a space.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With