Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

a reliable way to tell if = is for assignment in R code?

Tags:

r

I'm a stubborn useR who uses = instead of <- all the time, and apparently many R programmers will frown on this. I wrote the formatR package which can replace = with <- based on the parser package. As some of you might know, parser was orphaned on CRAN a few days ago. Although it is back now, this made me hesitant to depend on it. I'm wondering if there is another way to safely replace = with <-, because not all ='s mean assignment, e.g. fun(a = 1). Regular expressions are unlikely to be reliable (see line 18 of the mask.inline() function in formatR), but I will certainly appreciate it if you can improve mine. Perhaps the codetools package can help?

A few test cases:

# should replace
a = matrix(1, 1)
a = matrix(
  1, 1)

(a = 1)
a =
  1

function() {
  a = 1
}

# should not replace
c(
  a = 1
  )

c(
  a = c(
  1, 2))
like image 369
Yihui Xie Avatar asked Jun 30 '12 22:06

Yihui Xie


1 Answers

This answer uses regular expressions. There are a few edge cases where it will fail but it should be okay for most code. If you need perfect matching then you'll need to use a parser, but the regexes can always be tweaked if you run into problems.

Watch out for

#quoted function names
`my cr*azily*named^function!`(x = 1:10)
#Nested brackets inside functions
mean(x = (3 + 1:10))
#assignments inside if or for blocks
if((x = 10) > 3) cat("foo")
#functions running over multiple lines will currently fail
#maybe fixable with paste(original_code, collapse = "\n")
mean(
  x = 1:10
)

The code is based upon an example on the ?regmatches page. The basic idea is: swap function contents for a placeholder, do the replacement, then put your function contents back.

#Sample code.  For real case, use 
#readLines("source_file.R")
original_code <- c("a = 1", "b = mean(x = 1)")

#Function contents are considered to be a function name, 
#an open bracket, some stuff, then a close bracket.
#Here function names are considered to be a letter or
#dot or underscore followed by optional letters, numbers, dots or 
#underscores.  This matches a few non-valid names (see ?match.names
#and warning above).
function_content <- gregexpr(
  "[[:alpha:]._][[:alnum:._]*\\([^)]*\\)", 
  original_code
)

#Take a copy of the code to modify
copy <- original_code

#Replace all instances of function contents with the word PLACEHOLDER.
#If you have that word inside your code already, things will break.
copy <- mapply(
  function(pattern, replacement, x) 
  {
    if(length(pattern) > 0) 
    {
      gsub(pattern, replacement, x, fixed = TRUE) 
    } else x
  }, 
  pattern = regmatches(copy, function_content), 
  replacement = "PLACEHOLDER", 
  x = copy,
  USE.NAMES = FALSE
)

#Replace = with <-
copy <- gsub("=", "<-", copy)

#Now substitute back your function contents
(fixed_code <- mapply(
  function(pattern, replacement, x) 
  {
      if(length(replacement) > 0) 
      {
          gsub(pattern, replacement, x, fixed = TRUE) 
      } else x
  }, 
  pattern = "PLACEHOLDER", 
  replacement = regmatches(original_code, function_content), 
  x = copy,
  USE.NAMES = FALSE
))

#Write back to your source file
#writeLines(fixed_code, "source_file_fixed.R")
like image 169
Richie Cotton Avatar answered Sep 28 '22 00:09

Richie Cotton