Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Correct" way to specifiy optional arguments in R functions

Tags:

function

r

People also ask

What is an optional argument in Functions in R?

Optional arguments are ones that don't have to be set by the user, either because they are given a default value, or because the function can infer them from the other data you have given it. Even though they don't have to be set, they often provide extra flexibility.

How do you write a function optional argument?

You can assign an optional argument using the assignment operator in a function definition or using the Python **kwargs statement. There are two types of arguments a Python function can accept: positional and optional. Optional arguments are values that do not need to be specified for a function to be called.

How do you indicate optional arguments?

To indicate optional arguments, Square brackets are commonly used, and can also be used to group parameters that must be specified together. To indicate required arguments, Angled brackets are commonly used, following the same grouping conventions as square brackets.

How do you specify an argument in R?

Adding a Default Value in R You can specify default values for any disagreements in the argument list by adding the = sign and default value after the respective argument. You can specify a default value for argument mult to avoid specifying mult=100 every time.


You could also use missing() to test whether or not the argument y was supplied:

fooBar <- function(x,y){
    if(missing(y)) {
        x
    } else {
        x + y
    }
}

fooBar(3,1.5)
# [1] 4.5
fooBar(3)
# [1] 3

To be honest I like the OP's first way of actually starting it with a NULL value and then checking it with is.null (primarily because it is very simply and easy to understand). It maybe depends on the way people are used to coding but the Hadley seems to support the is.null way too:

From Hadley's book "Advanced-R" Chapter 6, Functions, p.84 (for the online version check here):

You can determine if an argument was supplied or not with the missing() function.

i <- function(a, b) {
  c(missing(a), missing(b))
}
i()
#> [1] TRUE TRUE
i(a = 1)
#> [1] FALSE  TRUE
i(b = 2)
#> [1]  TRUE FALSE
i(1, 2)
#> [1] FALSE FALSE

Sometimes you want to add a non-trivial default value, which might take several lines of code to compute. Instead of inserting that code in the function definition, you could use missing() to conditionally compute it if needed. However, this makes it hard to know which arguments are required and which are optional without carefully reading the documentation. Instead, I usually set the default value to NULL and use is.null() to check if the argument was supplied.


These are my rules of thumb:

If default values can be calculated from other parameters, use default expressions as in:

fun <- function(x,levels=levels(x)){
    blah blah blah
}

if otherwise using missing

fun <- function(x,levels){
    if(missing(levels)){
        [calculate levels here]
    }
    blah blah blah
}

In the rare case that you thing a user may want to specify a default value that lasts an entire R session, use getOption

fun <- function(x,y=getOption('fun.y','initialDefault')){# or getOption('pkg.fun.y',defaultValue)
    blah blah blah
}

If some parameters apply depending on the class of the first argument, use an S3 generic:

fun <- function(...)
    UseMethod(...)


fun.character <- function(x,y,z){# y and z only apply when x is character
   blah blah blah 
}

fun.numeric <- function(x,a,b){# a and b only apply when x is numeric
   blah blah blah 
}

fun.default <- function(x,m,n){# otherwise arguments m and n apply
   blah blah blah 
}

Use ... only when you are passing additional parameters on to another function

cat0 <- function(...)
    cat(...,sep = '')

Finally, if you do choose the use ... without passing the dots onto another function, warn the user that your function is ignoring any unused parameters since it can be very confusing otherwise:

fun <- (x,...){
    params <- list(...)
    optionalParamNames <- letters
    unusedParams <- setdiff(names(params),optionalParamNames)
    if(length(unusedParams))
        stop('unused parameters',paste(unusedParams,collapse = ', '))
   blah blah blah 
}

There are several options and none of them are the official correct way and none of them are really incorrect, though they can convey different information to the computer and to others reading your code.

For the given example I think the clearest option would be to supply an identity default value, in this case do something like:

fooBar <- function(x, y=0) {
  x + y
}

This is the shortest of the options shown so far and shortness can help readability (and sometimes even speed in execution). It is clear that what is being returned is the sum of x and y and you can see that y is not given a value that it will be 0 which when added to x will just result in x. Obviously if something more complicated than addition is used then a different identity value will be needed (if one exists).

One thing I really like about this approach is that it is clear what the default value is when using the args function, or even looking at the help file (you don't need to scroll down to the details, it is right there in the usage).

The drawback to this method is when the default value is complex (requiring multiple lines of code), then it would probably reduce readability to try to put all that into the default value and the missing or NULL approaches become much more reasonable.

Some of the other differences between the methods will appear when the parameter is being passed down to another function, or when using the match.call or sys.call functions.

So I guess the "correct" method depends on what you plan to do with that particular argument and what information you want to convey to readers of your code.


Just wanted to point out that the built-in sink function has good examples of different ways to set arguments in a function:

> sink
function (file = NULL, append = FALSE, type = c("output", "message"),
    split = FALSE)
{
    type <- match.arg(type)
    if (type == "message") {
        if (is.null(file))
            file <- stderr()
        else if (!inherits(file, "connection") || !isOpen(file))
            stop("'file' must be NULL or an already open connection")
        if (split)
            stop("cannot split the message connection")
        .Internal(sink(file, FALSE, TRUE, FALSE))
    }
    else {
        closeOnExit <- FALSE
        if (is.null(file))
            file <- -1L
        else if (is.character(file)) {
            file <- file(file, ifelse(append, "a", "w"))
            closeOnExit <- TRUE
        }
        else if (!inherits(file, "connection"))
            stop("'file' must be NULL, a connection or a character string")
        .Internal(sink(file, closeOnExit, FALSE, split))
    }
}

I would tend to prefer using NULL for the clarity of what is required and what is optional. One word of warning about using default values that depend on other arguments, as suggested by Jthorpe. The value is not set when the function is called, but when the argument is first referenced! For instance:

foo <- function(x,y=length(x)){
    x <- x[1:10]
    print(y)
}
foo(1:20) 
#[1] 10

On the other hand, if you reference y before changing x:

foo <- function(x,y=length(x)){
    print(y)
    x <- x[1:10]
}
foo(1:20) 
#[1] 20

This is a bit dangerous, because it makes it hard to keep track of what "y" is being initialized as if it's not called early on in the function.