Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the names of all functions in an R expression

I'm trying to find the names of all the functions used in an arbitrary legal R expression, but I can't find a function that will flag the below example as a function instead of a name.

test <- expression(
    this_is_a_function <- function(var1, var2){

    this_is_a_function(var1-1, var2)
})

all.vars(test, functions = FALSE)

[1] "this_is_a_function" "var1"              "var2" 

all.vars(expr, functions = FALSE) seems to return functions declarations (f <- function(){}) in the expression, while filtering out function calls ('+'(1,2), ...).

Is there any function - in the core libraries or elsewhere - that will flag 'this_is_a_function' as a function, not a name? It needs to work on arbitrary expressions, that are syntactically legal but might not evaluate correctly (e.g '+'(1, 'duck'))

I've found similar questions, but they don't seem to contain the solution.

If clarification is needed, leave a comment below. I'm using the parser package to parse the expressions.

Edit: @Hadley

I have expressions with contain entire scripts, which usually consist of a main function containing nested function definitions, with a call to the main function at the end of the script.

Functions are all defined inside the expressions, and I don't mind if I have to include '<-' and '{', since I can easy filter them out myself.

The motivation is to take all my R scripts and gather basic statistics about how my use of functions has changed over time.

Edit: Current Solution

A Regex-based approach grabs the function definitions, combined with the method in James' comment to grab function calls. Usually works, since I never use right-hand assignment.

function_usage <- function(code_string){
    # takes a script, extracts function definitions

    require(stringr)

    code_string <- str_replace(code_string, 'expression\\(', '')

    equal_assign <- '.+[ \n]+<-[ \n]+function'
    arrow_assign <- '.+[ \n]+=[ \n]+function'

    function_names <- sapply(
        strsplit(
            str_match(code_string, equal_assign), split = '[ \n]+<-'),    
        function(x) x[1])

    function_names <- c(function_names, sapply(
        strsplit(
            str_match(code_string, arrow_assign), split = '[ \n]+='),    
            function(x) x[1]))

        return(table(function_names))    
    }
like image 464
Róisín Grannell Avatar asked Jan 11 '13 10:01

Róisín Grannell


People also ask

What does the function names () do in R?

names() function in R Language is used to get or set the name of an Object. This function takes object i.e. vector, matrix or data frame as argument along with the value that is to be assigned as name to the object.

How do I list all objects in R?

ls() function in R Language is used to list the names of all the objects that are present in the working directory.

How many types of functions are in R?

There are mainly three types of function in R programming: Primitive Functions. Infix Functions. Replacement Functions.


2 Answers

Short answer: is.function checks whether a variable actually holds a function. This does not work on (unevaluated) calls because they are calls. You also need to take care of masking:

mean <- mean (x)

Longer answer:

IMHO there is a big difference between the two occurences of this_is_a_function.

In the first case you'll assign a function to the variable with name this_is_a_function once you evaluate the expression. The difference is the same difference as between 2+2 and 4.
However, just finding <- function () does not guarantee that the result is a function:

f <- function (x) {x + 1} (2)

The second occurrence is syntactically a function call. You can determine from the expression that a variable called this_is_a_function which holds a function needs to exist in order for the call to evaluate properly. BUT: you don't know whether it exists from that statement alone. however, you can check whether such a variable exists, and whether it is a function.

The fact that functions are stored in variables like other types of data, too, means that in the first case you can know that the result of function () will be function and from that conclude that immediately after this expression is evaluated, the variable with name this_is_a_function will hold a function.

However, R is full of names and functions: "->" is the name of the assignment function (a variable holding the assignment function) ...

After evaluating the expression, you can verify this by is.function (this_is_a_function). However, this is by no means the only expression that returns a function: Think of

f <- function () {g <- function (){}}
> body (f)[[2]][[3]]
function() {
}
> class (body (f)[[2]][[3]])
[1] "call"
> class (eval (body (f)[[2]][[3]]))
[1] "function"

all.vars(expr, functions = FALSE) seems to return functions declarations (f <- function(){}) in the expression, while filtering out function calls ('+'(1,2), ...).

I'd say it is the other way round: in that expression f is the variable (name) which will be asssigned the function (once the call is evaluated). + (1, 2) evaluates to a numeric. Unless you keep it from doing so.

e <- expression (1 + 2)
> e <- expression (1 + 2)
> e [[1]]
1 + 2
> e [[1]][[1]]
`+`
> class (e [[1]][[1]])
[1] "name"
> eval (e [[1]][[1]])
function (e1, e2)  .Primitive("+")
> class (eval (e [[1]][[1]]))
[1] "function"
like image 111
cbeleites unhappy with SX Avatar answered Sep 21 '22 05:09

cbeleites unhappy with SX


Instead of looking for function definitions, which is going to be effectively impossible to do correctly without actually evaluating the functions, it will be easier to look for function calls.

The following function recursively spiders the expression/call tree returning the names of all objects that are called like a function:

find_calls <- function(x) {
  # Base case
  if (!is.recursive(x)) return()

  recurse <- function(x) {
    sort(unique(as.character(unlist(lapply(x, find_calls)))))
  }

  if (is.call(x)) {
    f_name <- as.character(x[[1]])
    c(f_name, recurse(x[-1]))
  } else {
    recurse(x)
  }
}

It works as expected for a simple test case:

x <- expression({
  f(3, g())
  h <- function(x, y) {
    i()
    j()
    k(l())
  }
})
find_calls(x)
# [1] "{"        "<-"       "f"        "function" "g"        "i"        "j"  
# [8] "k"        "l"       
like image 23
hadley Avatar answered Sep 18 '22 05:09

hadley