Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Substitute the ^ (power) symbol with C's pow syntax in mathematical expression

Tags:

regex

r

I have a math expression, for example:

((2-x+3)^2+(x-5+7)^10)^0.5

I need to replace the ^ symbol to pow function of C language. I think that regex is what I need, but I don't know a regex like a pro. So I ended up with this regex:

(\([^()]*)*(\s*\([^()]*\)\s*)+([^()]*\))*

I don't know how to improve this. Can you advice me something to solve that problem?

The expected output:

pow(pow(2-x+3,2)+pow(x-5+7,10),0.5)
like image 865
Kalinkin Alexey Avatar asked Nov 15 '16 09:11

Kalinkin Alexey


2 Answers

One of the most fantastic things about R is that you can easily manipulate R expressions with R. Here, we recursively traverse your expression and replace all instances of ^ with pow:

f <- function(x) {
  if(is.call(x)) {
    if(identical(x[[1L]], as.name("^"))) x[[1L]] <- as.name("pow")
    if(length(x) > 1L) x[2L:length(x)] <- lapply(x[2L:length(x)], f)
  }
  x
}
f(quote(((2-x+3)^2+(x-5+7)^10)^0.5))

## pow((pow((2 - x + 3), 2) + pow((x - 5 + 7), 10)), 0.5)

This should be more robust than the regex since you are relying on the natural interpretation of the R language rather than on text patterns that may or may not be comprehensive.


Details: Calls in R are stored in list like structures with the function / operator at the head of the list, and the arguments in following elements. For example, consider:

exp <- quote(x ^ 2)
exp
## x^2
is.call(exp)
## [1] TRUE

We can examine the underlying structure of the call with as.list:

str(as.list(exp))
## List of 3
##  $ : symbol ^
##  $ : symbol x
##  $ : num 2

As you can see, the first element is the function/operator, and subsequent elements are the arguments to the function.

So, in our recursive function, we:

  • Check if an object is a call
    • If yes: check if it is a call to the ^ function/operator by looking at the first element in the call with identical(x[[1L]], as.name("^")
      • If yes: replace the first element with as.name("pow")
      • Then, irrespective of whether this was a call to ^ or anything else:
        • if the call has additional elements, cycle through them and apply this function (i.e. recurse) to each element, replacing the result back into the original call (x[2L:length(x)] <- lapply(x[2L:length(x)], f))
    • If no: just return the object unchanged

Note that calls often contain the names of functions as the first element. You can create those names with as.name. Names are also referenced as "symbols" in R (hence the output of str).

like image 102
BrodieG Avatar answered Oct 24 '22 16:10

BrodieG


Here is a solution that follows the parse tree recursively and replaces ^:

#parse the expression
#alternatively you could create it with
#expression(((2-x+3)^2+(x-5+7)^10)^0.5)
e <- parse(text = "((2-x+3)^2+(x-5+7)^10)^0.5")

#a recursive function
fun <- function(e) {    
  #check if you are at the end of the tree's branch
  if (is.name(e) || is.atomic(e)) { 
    #replace ^
    if (e == quote(`^`)) return(quote(pow))
    return(e)
  }
  #follow the tree with recursion
  for (i in seq_along(e)) e[[i]] <- fun(e[[i]])
  return(e)    
}

#deparse to get a character string    
deparse(fun(e)[[1]])
#[1] "pow((pow((2 - x + 3), 2) + pow((x - 5 + 7), 10)), 0.5)"

This would be much easier if rapply worked with expressions/calls.

Edit:

OP has asked regarding performance. It is very unlikely that performance is an issue for this task, but the regex solution is not faster.

library(microbenchmark)
microbenchmark(regex = {
  v <- "((2-x+3)^2+(x-5+7)^10)^0.5"
  x <- grepl("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", v, perl=TRUE)
  while(x) {
    v <- sub("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", "pow(\\2, \\3)", v, perl=TRUE);
    x <- grepl("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", v, perl=TRUE)
  }
},
BrodieG = {
  deparse(f(parse(text = "((2-x+3)^2+(x-5+7)^10)^0.5")[[1]]))
},
Roland = {
  deparse(fun(parse(text = "((2-x+3)^2+(x-5+7)^10)^0.5"))[[1]])
})

#Unit: microseconds
#    expr     min      lq     mean  median      uq     max neval cld
#   regex 321.629 323.934 335.6261 335.329 337.634 384.623   100   c
# BrodieG 238.405 246.087 255.5927 252.105 257.227 355.943   100  b 
#  Roland 211.518 225.089 231.7061 228.802 235.204 385.904   100 a

I haven't included the solution provided by @digEmAll, because it seems obvious that a solution with that many data.frame operations will be relatively slow.

Edit2:

Here is a version that also handles sqrt.

fun <- function(e) {    
  #check if you are at the end of the tree's branch
  if (is.name(e) || is.atomic(e)) { 
    #replace ^
    if (e == quote(`^`)) return(quote(pow))
    return(e)
  }
  if (e[[1]] == quote(sqrt)) {
    #replace sqrt
    e[[1]] <- quote(pow)
    #add the second argument
    e[[3]] <- quote(0.5)
  }
  #follow the tree with recursion
  for (i in seq_along(e)) e[[i]] <- fun(e[[i]])
  return(e)    
}

e <- parse(text = "sqrt((2-x+3)^2+(x-5+7)^10)")
deparse(fun(e)[[1]])
#[1] "pow(pow((2 - x + 3), 2) + pow((x - 5 + 7), 10), 0.5)"
like image 29
Roland Avatar answered Oct 24 '22 14:10

Roland