R function returns results at second attempt even "if statement" raises error

Question

Below are my code and result. The first attempt raises an error, and the second attempt returns the result without raising error. I'm curious about this process.

Define function

tmp_func <- function(x = 1)
 {
   x <- x+1
   if(!is.null(y=NULL))
   {
   }
   return(x)
 }

First attempt

c <- tmp_func()
#  Error in is.null(y = NULL) : 
#  supplied argument name 'y' doe not match 'x'

Second attempt

c <- tmp_func()
c
#[1] 2

SamR · Accepted Answer

The second time you run this function it is compiled into byte code by R's JIT compiler. The effect you see is a quirk of how the compiler handles primitive functions such as is.null() with constant arguments that are special values, such as NULL, TRUE or FALSE.

`is.null()` does not take a `y` argument

The first thing to understand is that the x being referred to in your error message is not the x in your function but the x that is.null() expects as a parameter. The function signature is:

is.null
# function (x)  .Primitive("is.null")

We can make this particular error appear whenever we provide a named argument to a primitive function which it does not expect:

is.null(y = NULL)
# Error in is.null(y = NULL) : 
#   supplied argument name 'y' does not match 'x'
abs(y = 1)
# Error in abs(y = 1) : supplied argument name 'y' does not match 'x'

So is.null(y=NULL) should raise an error. To assign NULL to y and at the same time evaluate whether y is NULL, you need is.null(y <- NULL).

Let's rename your function's parameter to a to reduce any ambiguity around x, and also remove the unnecessary addition operation:

tmp_func <- function(a = 1) {
    if (!is.null(y = NULL)) {}
    return(a)
}

tmp_func()
# Error in is.null(y = NULL) : 
#   supplied argument name 'y' does not match 'x
tmp_func()
# [1] 1

We can see it is still complaining about x, rather than a.

The role of the JIT compiler

The question becomes why we do not see this error the second time. The reason for this is because R has a just-in-time (JIT) compiler which compiles frequently used functions to byte code. You can check your JIT level by setting it to a negative value:

compiler::enableJIT(-1) 
# [1] 3

The default is 3. Yours must be at least 2, meaning small functions are compiled before their second use.

Compiling the function changes how it handles this expression

The second time you run your function, it is compiled. We can see this if we print the function source after the first and second call:

tmp_func <- function(a = 1) {
    if (!is.null(y = NULL)) {}
    return(a)
}

tmp_func()
# Error in is.null(y = NULL) :
#   supplied argument name 'y' does not match 'x

tmp_func # print source

# function(a = 1) {
#     if (!is.null(y = NULL)) {}
#     return(a)
# }
tmp_func()
# [1] 1

tmp_func # print source again

# function(a = 1) {
#     if (!is.null(y = NULL)) {}
#     return(a)
# }
# <bytecode: 0x3f42e28>

Note the final line shows that this function is now compiled into byte code. The compiler seems not to care about the argument name because it skips the R function, and the way it interfaces with the .Primitive("is.null") in the C code does not care. We can see in the docs that constant NULL arguments are a special case (p6):

Certain constant values, such as TRUE, FALSE, and NULL appear very often in code. It may be useful to provide and use special instructions for loading these.

You can test this for yourself. If you change it to is.null(y = "a"), for example, the compiled code acts the same way as the interpreted code. Similarly, if you supply an unused argument to a non-primitive function the compiler raises the same error as the interpreter.

However, in this special case of a constant NULL argument and a primitive function, the compiler ignores the argument name. We can see that in the instructions generated (note the LDNULL.OP to load the NULL constant):

compiler::disassemble(tmp_func)
list(12L, BASEGUARD.OP, 1L, 6L, LDNULL.OP, ISNULL.OP, 
    NOT.OP, 4L, BRIFNOT.OP, 5L, 14L, LDNULL.OP, GOTO.OP, 15L, 
    LDNULL.OP, POP.OP, GETVAR.OP, 7L, RETURN.OP)

This is technically a compiler bug

This is odd behaviour and I suppose technically it is a compiler bug. We can see that with this example:

f <- \(x) is.null(x = NULL) # valid
g <- \(x) is.null(y = NULL) # invalid

compiler::disassemble(f)
# list(12L, BASEGUARD.OP, 0L, 6L, LDNULL.OP, ISNULL.OP, RETURN.OP)
compiler::disassemble(g)
# list(12L, BASEGUARD.OP, 0L, 6L, LDNULL.OP, ISNULL.OP, RETURN.OP)

The second byte code should not be the same as the first. But it is because the name of the argument is ignored.

However, is.null(y = NULL) is not really an expression that I would worry about. This issue appears to only affect primitive functions supplied with an incorrectly named constant argument which is TRUE, FALSE or NULL. So as far as compiler bugs go, this doesn't seem to me like a very important one. In any case, if you want to ensure that you code is never JIT compiled, or establish if something is caused by JIT compilation, you can disable JIT compilation:

compiler::enableJIT(0)

JIT is disabled if the argument is 0. If level is 1 then larger closures are compiled before their first use. If level is 2, then some small closures are also compiled before their second use. If level is 3 then in addition all top level loops are compiled before they are executed.

You can read more about JIT compilation in the docs.

R function returns results at second attempt even "if statement" raises error

Tags:

r

Palantir

1 Answers

`is.null()` does not take a `y` argument

The role of the JIT compiler

Compiling the function changes how it handles this expression

This is technically a compiler bug

SamR

Recent Activity

Donate For Us

R function returns results at second attempt even "if statement" raises error

Tags:

r

Palantir

1 Answers

is.null() does not take a y argument

The role of the JIT compiler

Compiling the function changes how it handles this expression

This is technically a compiler bug

SamR

Related questions

Recent Activity

Donate For Us

`is.null()` does not take a `y` argument