Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unit testing outside of a package in R

Tags:

unit-testing

r

Often times we're using R in contexts where we want to have reproducibility in the face of modifications and where we have complex code bases but outside of writing a package. It seems like testthat and the other testing packages are geared towards unit testing for package code, which makes sense since that's the most common case where you need a lot of testing where you're not in full control of all the data, but I was wondering if there was a good package or method for unit testing in R outside of the context of writing a package.

For example, a lot of times in a package context you're testing something of the form:

foo <- function(bar){
    # do something to bar
    return(bar.outcome)
}

and so then you're testing for expected output from the function, that things are the right type, that there is proper error handling. The way you do this is that you create a directory in your package for tests and write them there and then devtools can use load_all and testthat to run them and produce results.

One thing I would like to be able to do is run these same sort of tests outside of the context of a package, such as in a script. This is important because a lot of R code that is written in academia doesn't generalize a whole lot to different contexts or data without considerable difficulty, so that having a package doesn't make much sense, but at the same time unit testing would make it easier to extend the code in future packages. That's the easy case.

The harder case is actually something you rarely do in packages, which is test things about the shape, kind, and state of the data. So, for example, I often read R code written in academia with comments like

data <- data %>% doSomething() #1023 rows 

parameter_df <- // read file 
print(parameter_df) # 5 columns 
data <- data %>% doSomething(param = parameter_df)

lapply(df, class) #should be char, char, char, numeric, Date

I like the idea that "every time you want to write a print statement, write a test instead", but I actually don't have a good framework for how this should be done in R. Especially in this harder case where you're not testing a function, you're testing to make sure that the data flowing through your program is correct.

The context here is that R is used in a lot of contexts where the point of a script is replication in the scientific sense, but where there is possibly great gain from people being able to easily extend other's scripts which are released as part of replication materials for new projects, which is much harder to do, especially in complex code, whenever there are no tests and code can be very fragile and fail in nontrivial and silent ways.

like image 924
Juan Sebastian Lozano Avatar asked Sep 19 '19 00:09

Juan Sebastian Lozano


1 Answers

https://github.com/ropensci/assertr provides a framework better suited for testing data analysis workflows.

like image 137
Mikko Marttila Avatar answered Sep 28 '22 20:09

Mikko Marttila