In R, I am attempting to write code that will work on any adaptations of a string pattern. An example of a string is:
string <- "y ~ 1 + a + (b | c) + (d^2) + e + (1 | f) + g"
I would like to remove ONLY the portions that contain a pattern of "(, |, )" such as:
(b | c) and (1 | f)
and be left with:
"y ~ 1 + a + (d^2) + e + g"
Please note that the characters could change values (e.g., 'b' could become '1' and 'c' could become 'predictor') and I would like the code to still work. Spaces are also not required for the string, it could also be "y~1+a+(b|c)+(d^2)+e+(1|f)+g" or any combination of space/no-space thereof. The order of the characters could change as well to be "y~1+a+(b|c)+e+(1|f)+(d^2)+g".
I have tried using base R string manipulation functions (gsub and sub) to search for the pattern of "(, |, )" by using variations of the pattern such as:
"\\(.*\\|.*\\)"
"\\(.*\\|"
"\\(.+\\|.+\\)"
"\\|.+\\)"
as well as many of the stringr functions to find and replace this pattern with a blank. However, using both base R and stringr what happens when I do this is that it removes EVERYTHING, for example:
gsub("\\(.*\\|.*\\)", "", string)
produces:
"y ~ 1 + a + + g"
and
gsub("\\(.*\\|", "", string)
produces:
"y ~ 1 + a + f) + g"
I have additionally tried using the str_locate functions but am running into issues using that efficiently since there are multiple sets of parentheses and I want the locations only of the instances with a "|" between them.
Any help is greatly appreciated.
1) gsubfn Define a function which returns an empty string or its input depending on whether the input has a | or not and run gsubfn
with it. gsubfn is like gsub
except the replacement string can be a function which takes the match as input and replaces it with the function's output.
library(gsubfn)
pick <- function(x) if (grepl("|", x, fixed = TRUE)) "" else trimws(x)
gsubfn("[+] *[(].*?[)]", pick, string, perl = TRUE)
## [1] "y ~ 1 + a + (d^2) + e + g"
2) Base R Split the input into terms and grep out the ones without |. Then put what is left back together using reformulate
.
s <- trimws(grep("\\|", strsplit(string, "[~+]")[[1]], invert = TRUE, value = TRUE))
reformulate(format(s[-1]), s[1])
## y ~ 1 + a + (d^2) + e + g
3) getTerms This also uses only base R but first converts the string to an expression representing a formula and parses it using getTerms
found in this SO post: Terms of a sum in a R expression It then proceeds as in (2).
p <- parse(text = string)[[1]]
s <- grep("\\|", getTerms(p[[3]]), value = TRUE, invert = TRUE)
reformulate(s, p[[2]])
## y ~ 1 + a + (d^2) + e + g
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With