I have a math expression, for example:
((2-x+3)^2+(x-5+7)^10)^0.5
I need to replace the ^
symbol to pow
function of C language. I think that regex is what I need, but I don't know a regex like a pro. So I ended up with this regex:
(\([^()]*)*(\s*\([^()]*\)\s*)+([^()]*\))*
I don't know how to improve this. Can you advice me something to solve that problem?
The expected output:
pow(pow(2-x+3,2)+pow(x-5+7,10),0.5)
One of the most fantastic things about R is that you can easily manipulate R expressions with R. Here, we recursively traverse your expression and replace all instances of ^
with pow
:
f <- function(x) {
if(is.call(x)) {
if(identical(x[[1L]], as.name("^"))) x[[1L]] <- as.name("pow")
if(length(x) > 1L) x[2L:length(x)] <- lapply(x[2L:length(x)], f)
}
x
}
f(quote(((2-x+3)^2+(x-5+7)^10)^0.5))
## pow((pow((2 - x + 3), 2) + pow((x - 5 + 7), 10)), 0.5)
This should be more robust than the regex since you are relying on the natural interpretation of the R language rather than on text patterns that may or may not be comprehensive.
Details: Calls in R are stored in list like structures with the function / operator at the head of the list, and the arguments in following elements. For example, consider:
exp <- quote(x ^ 2)
exp
## x^2
is.call(exp)
## [1] TRUE
We can examine the underlying structure of the call with as.list
:
str(as.list(exp))
## List of 3
## $ : symbol ^
## $ : symbol x
## $ : num 2
As you can see, the first element is the function/operator, and subsequent elements are the arguments to the function.
So, in our recursive function, we:
^
function/operator by looking at the first element in the call with identical(x[[1L]], as.name("^")
as.name("pow")
^
or anything else:
x[2L:length(x)] <- lapply(x[2L:length(x)], f)
)Note that calls often contain the names of functions as the first element. You can create those names with as.name
. Names are also referenced as "symbols" in R (hence the output of str
).
Here is a solution that follows the parse tree recursively and replaces ^
:
#parse the expression
#alternatively you could create it with
#expression(((2-x+3)^2+(x-5+7)^10)^0.5)
e <- parse(text = "((2-x+3)^2+(x-5+7)^10)^0.5")
#a recursive function
fun <- function(e) {
#check if you are at the end of the tree's branch
if (is.name(e) || is.atomic(e)) {
#replace ^
if (e == quote(`^`)) return(quote(pow))
return(e)
}
#follow the tree with recursion
for (i in seq_along(e)) e[[i]] <- fun(e[[i]])
return(e)
}
#deparse to get a character string
deparse(fun(e)[[1]])
#[1] "pow((pow((2 - x + 3), 2) + pow((x - 5 + 7), 10)), 0.5)"
This would be much easier if rapply
worked with expressions/calls.
Edit:
OP has asked regarding performance. It is very unlikely that performance is an issue for this task, but the regex solution is not faster.
library(microbenchmark)
microbenchmark(regex = {
v <- "((2-x+3)^2+(x-5+7)^10)^0.5"
x <- grepl("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", v, perl=TRUE)
while(x) {
v <- sub("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", "pow(\\2, \\3)", v, perl=TRUE);
x <- grepl("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", v, perl=TRUE)
}
},
BrodieG = {
deparse(f(parse(text = "((2-x+3)^2+(x-5+7)^10)^0.5")[[1]]))
},
Roland = {
deparse(fun(parse(text = "((2-x+3)^2+(x-5+7)^10)^0.5"))[[1]])
})
#Unit: microseconds
# expr min lq mean median uq max neval cld
# regex 321.629 323.934 335.6261 335.329 337.634 384.623 100 c
# BrodieG 238.405 246.087 255.5927 252.105 257.227 355.943 100 b
# Roland 211.518 225.089 231.7061 228.802 235.204 385.904 100 a
I haven't included the solution provided by @digEmAll, because it seems obvious that a solution with that many data.frame operations will be relatively slow.
Edit2:
Here is a version that also handles sqrt
.
fun <- function(e) {
#check if you are at the end of the tree's branch
if (is.name(e) || is.atomic(e)) {
#replace ^
if (e == quote(`^`)) return(quote(pow))
return(e)
}
if (e[[1]] == quote(sqrt)) {
#replace sqrt
e[[1]] <- quote(pow)
#add the second argument
e[[3]] <- quote(0.5)
}
#follow the tree with recursion
for (i in seq_along(e)) e[[i]] <- fun(e[[i]])
return(e)
}
e <- parse(text = "sqrt((2-x+3)^2+(x-5+7)^10)")
deparse(fun(e)[[1]])
#[1] "pow(pow((2 - x + 3), 2) + pow((x - 5 + 7), 10), 0.5)"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With