I have string inputs in the following format:
my.strings <- c("FACT11", "FACT11:FACT20", "FACT1sometext:FACT20", "FACT1text with spaces:FACT20", "FACT14:FACT20", "FACT1textAnd1312:FACT2etc", "FACT12:FACT22:FACT31")
I would like to extract all the "FACT"s and the first number following FACT. So the result from this example would be:
c("FACT1", "FACT1 FACT2", "FACT1 FACT2", "FACT1 FACT2", "FACT1 FACT2", "FACT1 FACT2", "FACT1 FACT2 FACT3")
Alternatively, the result could be a list, where each element of the list is a vector with 1 up to 3 items.
What I got so far is:
gsub("(FACT[1-3]).*?:(FACT[1-3]).*", '\\1 \\2', my.strings)
# [1] "FACT11" "FACT1 FACT2 " "FACT1 FACT2 " "FACT1 FACT2 " "FACT1 FACT2 " "FACT1 FACT2 "
# [7] "FACT1 FACT2 " "FACT1 FACT2 "
It kinda looks good, except for the "FACT11" for the first element instead of "FACT1" (dropping the second "1"), and missing the "FACT3" for the last element of my.strings. But adding another group to gsub somehow messes the whole thing up.
gsub("(FACT[1-3]).*?:(FACT[1-3]).*?:(FACT[1-3]).*?", '\\1 \\2 \\3', my.strings)
# [1] "FACT11" "FACT11:FACT20" "FACT1sometext:FACT20"
# [4] "FACT1text with spaces:FACT20" "FACT14:FACT20" "FACT1textAnd1312:FACT2etc"
# [7] "FACT12:FACT21" "FACT1 FACT2 FACT31"
So how can I properly extract the groups?
You may use a base R approach, too:
> m <- regmatches(my.strings, gregexpr("FACT[1-3]", my.strings))
> sapply(m, paste, collapse=" ")
[1] "FACT1"
[2] "FACT1 FACT2"
[3] "FACT1 FACT2"
[4] "FACT1 FACT2"
[5] "FACT1 FACT2"
[6] "FACT1 FACT2"
[7] "FACT1 FACT2 FACT3"
Extract all matches with your FACT[1-3] (or FACT[0-9], or FACT\\d) pattern, and then "join" them with a space.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With