Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Restrict R functions for an exam

Tags:

r

An unusual question: What is the simplest way to restrict R to certain functions for an exam? For instance, in our use case we would like students to use functions for distributions, but not other functions. So calls to qt, dt, pt, rt (an so on) are allowed, but not other functions.

We use the safe exam browser, so are able to call a specific version of R or use a specific startup file (.Rprofile).

like image 742
Johannes Titz Avatar asked Sep 01 '25 15:09

Johannes Titz


2 Answers

The following is a rough outline. I would never recommend this as a security-hardened sandbox: there may be subtle ways to break out of it, and I would at least work on it for a few days and request feedback from multiple sources before having any confidence in it.

But as a toy sandbox for an exam it may be good enough.

It essentially enforces a list of allowed functions by masking all other functions that are provided by the core R packages. This is in contrast to using a blocklist, since the latter will basically be guaranteed to always miss something.

However, out of laziness the following uses a hybrid approach rather than a pure allowlist. This is because we need to allow most primitive R functions, otherwise the console would be completely broken. We therefore maintain a (rather small) blocklist of primitive functions and allow all other primitives. But we block all the rest (i.e. non-primitive functions).

To block functions globally, we insert a “firewall” environment into the environment search path, which will be hit before any of the masked, attached environments. And since we mask all functions/operators that could be used to circumvent the usual name lookup rules, users (hopefully) aren’t able to circumvent this firewall.

One last thing is that this code needs to be executed after all core R packages have been attached, which is done by base::.First.sys(). Unfortunately R doesn’t provide a customisation hook that we could execute at this point. But we can override base::.First.sys().

Put the following into the site or user profile:

local({
  # We need to allow most primitives; so we take a (dangerous!) shortcut of allowing
  # them all, and then selectively removing (potentially) dangerous functions from them.
  primitives = function () {
    fun_names = utils::lsf.str(baseenv())
    Filter(\(x) is.primitive(get(x, baseenv())), fun_names)
  }

  block_list = c(
    '::',
    ':::',
    'as.environment',
    'browser',
    'baseenv',
    'emptyenv',
    'environment<-',
    'Exec',
    'globalenv',
    'lazyLoadDBfetch',
    'on.exit',
    'pos.to.env'
  )

  allow_list = c(
    # Put explicitly allowed stuff here!
    'qt', 'dt', 'pt', 'rt',
    'print',
    'q', 'quit',
    setdiff(primitives(), block_list)
  )

  stop = stop
  forbidden = function (...) stop('Forbidden')

  # The subsequent logic has to happen after the core packages are attached — i.e. after
  # `.First.sys()` is run. Since there is no suitable hook, we override the latter:

  .First.sys = .First.sys

  first_sys = function () {
    .First.sys()

    core_packages = sub('package:', '', search()[startsWith(search(), 'package:')])
    all_exported_functions = unlist(lapply(core_packages, \(pkg) getNamespaceExports(pkg)))
    forbidden_names = setdiff(all_exported_functions, allow_list)

    # Eagerly load all names, otherwise they will not be able to be loaded later on.
    lapply(allow_list, \(name) get(name))

    forbidden_list = stats::setNames(
      lapply(forbidden_names, \(.) forbidden),
      forbidden_names
    )

    # ‘compiler’ needs special handling.

    eapply(loadNamespace('compiler'), force, all.names = TRUE)

    `:::` = `:::`
    match_call = match.call
    eval = eval
    getNamespace = getNamespace
    ..getNamespace = ..getNamespace

    forbidden_list$`:::` = function (name, value) {
      if (as.character(substitute(name)) == 'compiler') {
        call = match_call()
        call[[1L]] = `:::`
        eval(call)
      } else {
        stop('Forbidden')
      }
    }

    # Allow calling this only once; this is required by the R interpreter, but we can
    # disable subsequent calls by the user.
    first_call = TRUE
    self = environment()

    forbidden_list$getNamespace = function (name) {
      if (name == 'compiler' && first_call) {
        self$first_call = FALSE
        getNamespace(name)
      } else {
        stop('Forbidden')
      }
    }

    list2env(
      forbidden_list,
      envir = attach(NULL, name = 'blocked')
    )
  }

  unlockBinding('.First.sys', baseenv())
  assign('.First.sys', first_sys, envir = baseenv())
})

Some more caveats:

  1. If the user is able to control the session start, the above won’t work. This includes quitting the R session and starting a new session from the shell, as well as hitting Ctrl-C while R is starting, to abort execution of the profile code.
  2. The above disables debugging/tracing functions and error handling customisation (which can be done via options() and possibly via Sys.setenv()), but it’s possible that I overlooked some clever way of breaking into the scope of a called function. If that ever happens, it’s game over: at that point the user can freely roam around “behind” the firewall environment, and therefore call any functions they desire.
  3. Adding functions to the allow_list requires care! Adding the wrong function will completely circumvent the block. For example, allowing as.environment() may seem innocuous. But as.environment(3) gives the user access to the search path beyond the firewall, and thus breaks the sandbox.
  4. The above causes an (innocuous, as far as I can tell) error message at startup. I’m sure this can be avoided, but it requires digging through the R startup source code, to figure out where this is coming from.
  5. Auto-completion isn’t working, since that is implemented via function calls that are blocked. It could be reenabled with some extra work.
  6. I have only tested this directly in native the R terminal. It’s possible that other terminal shells for R work slightly differently and break the sandbox (or lead to other issues).
  7. The above code contains an exception to allow loading the ‘compiler’ package, which is required by the R interpreter. Unfortunately this punches a gaping hole into our firewall, since that package allows lots of shenanigans. I am almost certain that this could be exploited. A better (i.e. non-proof-of-concept) implementation would tighten the code further to ensure that getNamespace('compiler') can only be called by the R interpreter, not by the user. This is “left as an exercise to the reader”.
like image 92
Konrad Rudolph Avatar answered Sep 04 '25 05:09

Konrad Rudolph


I would recommend not trying to whitelist functions. Firstly, R allows direct manipulation of language objects which would make it possible to circumvent any approach that I can think of. Secondly, R has no concept of permissions in the way that an operating system does, and I can't see any way to prevent students from undoing whatever you put in place.

Logging will not work either

A more practical approach might be to log everything students do. Initially, I thought this might be a solution. However, in writing this answer, I realised it will not so I thought I would explain why. Firstly, if you log to a local file then students will need to have write access to the file, which means that they may be able to delete or amend logs. So instead you could send the logs to a web server. Let's do it in R here for simplicity.

logging_server.R

#* Log received messages
#* @post /log
function(entry, username, nodename) {
    log_file <- sprintf(
        "r_command_log-%s-%s.txt", username, nodename
    )
    cat(entry, file = log_file, append = TRUE, sep = "\n")
    list(status = "success")
}

run_server.R

In this case I'll run it on localhost, i.e. the same machine that I am running the R code on, but in reality you would want an actual server.

library(plumber)
pr <- plumb("logging_server.R")
pr$run(host = "0.0.0.0", port = 8000)

You can then run this from the terminal:

Rscript run_server.R

.Rprofile

Add then add the following to your .Rprofile.

log_command <- function(expr, value, ok, visible) {
    log_entry <- paste0(
        "[", Sys.time(), "]: ",
        deparse(expr)
    )

    httr::POST(
        url = "http://0.0.0.0:8000/log",
        body = list(
            entry = log_entry,
            username = Sys.info()["user"],
            nodename = Sys.info()["nodename"]
        ),
        encode = "form"
    )

    TRUE
}
addTaskCallback(log_command)

Example R script

Let's say I run this:

qt(0.15, 5, lower.tail = FALSE)
message("I am cheating now!")

A file is created called r_command_log-samr-samdesktop.txt which contains:

[2024-10-08 11:19:36.343392]: addTaskCallback(log_command)
[2024-10-08 11:19:37.046776]: qt(0.15, 5, lower.tail = FALSE)
[2024-10-08 11:19:37.427953]: message("I am cheating now!")

Why this will not work

There are several caveats that occur to me:

  1. A user could do removeTaskCallback() - I don't think this is a huge problem on its own as it should be logged and would be pretty clear evidence of cheating.
  2. A user could disconnect from the network. This is a much bigger problem and could be a deal breaker, depending on your environment. Is it possible for your users to do this either physically or through software? If so they could disconnect, remove the callback, reconnect, do what they need to do, disconnect, re-create the callback and add some dummy code that looks innocent.
  3. What happens if the log is not created for some reason? It may be evidence that there has been tampering but it is not definitive.
  4. This is the big one. If a function exits with an error, there is no callback. For example, stop("I am cheating without a callback!") does not create an entry in the log. This means someone could this:
{
    message("forbidden function")
    stop("I am cheating without a callback!")
}

As the block exited with an error, does not create an entry in the log. It might be possible to add a custom error handler by setting options(error) but I think this could probably be circumvented by manipulating the stack frame so R does not realise what the last error actually was.

So I think the principle of this answer is that it is unlikely to be possible to achieve your goal securely through whitelisting or logging.

like image 32
SamR Avatar answered Sep 04 '25 05:09

SamR