Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reverse of getParseData: from parsed code back to code

Tags:

parsing

r

HI I'm using getParseData to fix something in the code. For example replace = with <-

txt = "
flag = F

if(flag){
        dat = data.frame(x = 1, stringAsFactor = F)
} else {
        dat <- 1
}
"

sf = parse(text = txt)
p = getParseData(sf)

p[p$token == 'EQ_ASSIGN', 'text'] = '<-'

Now how to go back from p - a data.frame showing the parsed code - to R code as a string? Thanks


update: try getParseText

firstly I can't get the example in getParseText to work:

fn <- function(x) {
        x + 1 # A comment, kept as part of the source
}

d <- getParseData(fn)
d
# NULL

Then I try to mimic the code in ?getParseText in my example:

txt = "
flag = F

f2 = 1 + 1

if(flag){
        dat = data.frame(x = 1, stringAsFactor = F)
} else {
        dat <- 1
}
"

sf = parse(text = txt)
p = getParseData(sf)

plus <- which(p$token == "'+'")
sum <- p$parent[plus]
p[as.character(sum), ]
cat(getParseText(p, sum))
# 1 + 1

cat(getParseText(p, unique(p$parent)))
# not correct

cat(paste0(unique(getParseText(p, p$id)), collapse=" ")) # incorrect
# flag = F f2 1 + 1 1 + if(flag){
#         dat = data.frame(x = 1, stringAsFactor = F)
# } else {
#         dat <- 1
# } if ( ) {
#         dat = data.frame(x = 1, stringAsFactor = F)
# } { dat data.frame(x = 1, stringAsFactor = F) data.frame x , stringAsFactor } else {
#         dat <- 1
# } dat <- 1 <-
like image 451
YJZ Avatar asked Oct 11 '25 08:10

YJZ


1 Answers

You don't need to use getParseData to replace the = symbols with <- in R code. One of the amazing things about R is that you can operate directly on the language so we will do that here.

sf = parse(text = txt)

sf is an expression object, which effectively is a list of R language objects, one for each top-level statement:

sf[[1]]
## flag = F
sf[[2]]
## if (flag) {
##     dat = data.frame(x = 1, stringAsFactor = F)
## } else {
##     dat <- 1
## }

The above are language calls. A call is a an unevaluated R statement, which you get from parse, or with quote:

my.call <- quote(1 + 1)
my.call
## 1 + 1
class(my.call)
## [1] "call"

The thing about calls is that R lies to you about their underlying structures. R calls are lists (well, pairlists really but the distinction is irrelevant here) that R displays and treats specially. We can show their true nature:

as.list(my.call)
## [[1]]
## `+`
## 
## [[2]]
## [1] 1
## 
## [[3]]
## [1] 1

Note how the leading element of the call is the "function", or in this case the "operator", which in R is just a function anyway (more precisely the name of the function/operator). This is always the case with calls. The first element is the function, the following elements are arguments. R pretends operators are special and displays them differently, but for underlying call structure and evaluation they are the same.

Look at what we can do:

my.call[[1]] <- as.name('-')
my.call
## 1 - 1

We used as.name to create a special type of R object called a symbol. These can be used to reference functions in calls. As you can imagine, if we can replace + with -, we can also do the same for = and <-. But to do it systematically we need to recurse through the language tree. We will write a simple function that does just that:

symb_rep <- function(lang, from, to) {
  if(is.call(lang)) {
    if(lang[[1]] == from) lang[[1]] <- to
    lang[-1] <- lapply(lang[-1], symb_rep, from, to)
  }
  lang
}

Then we can run it on our original expression, which recall is a list of calls, so we use lapply to apply to each element:

lang.sub <- lapply(sf, symb_rep, as.name("="), as.name("<-"))
lang.sub
## [[1]]
## flag <- F
## 
## [[2]]
## if (flag) {
##     dat <- data.frame(x = 1, stringAsFactor = F)
## } else {
##     dat <- 1
## }

If you want the character representation back you can use deparse:

unlist(lapply(lang.sub, deparse))
## [1] "flag <- F"                                       
## [2] "if (flag) {"                                     
## [3] "    dat <- data.frame(x = 1, stringAsFactor = F)"
## [4] "} else {"                                        
## [5] "    dat <- 1"                                    
## [6] "}" 

Pretty cool, no?

One last point, you'll notice that in data.frame(x = 1, ...), the = was not replaced. Why is that? Well, that = does not actually exist in the call data. It is displayed by R as decoration. In reality, notice how the parameter names are stored:

as.list(quote(data.frame(x=1, y=2)))
## [[1]]
## data.frame
## 
## $x
## [1] 1
## 
## $y
## [1] 2

There is no = in sight, because the parameter names are just names of the call object. R just displays equal signs as a visual aid when it prints out the call, and interprets them the same way when it parses the call. This "lie" is why these two expressions are semantically distinct:

data.frame(x <- 5)
##   x....5
## 1      5
x
## [1] 5
data.frame(x = 1)
##   x
## 1 1
x
## [1] 5

In the former R assigns 5 to x in the global environment, and makes up a name for the data frame parameter. In the second R uses x as the parameter name but doesn't assign to the global environment. Because R relies on the = in parameter lists to get parameter names it cannot use it with the normal assignment semantics.

like image 75
BrodieG Avatar answered Oct 13 '25 21:10

BrodieG