Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use data within a function in an R package?

I am currently writing a function for an R package. Part of what this function is aimed to do is (a) take data as an input and (b) check one of its columns against a list of acceptable values.

These acceptable values are given to me from another organization. They are within a .csv file. What I would like to do is load this .csv file and use it as a reference to check if the column from the user has valid values.

For example, let's say the user has these data:

set.seed(1839)
user <- data.frame(x=sample(letters,10),
                   y=rnorm(10))
user

   x          y
1  v -0.7025836
2  p -1.4586245
3  f  0.1987113
4  y  1.0544690
5  o -0.7112214
6  m  0.2956671
7  b  0.3016737
8  a -0.0945271
9  x -0.2790357
10 c  0.1681388

And the .csv contains many (useful) columns, but I only care about one (z) for the moment:

ref <- data.frame(z=letters[1:4], a=rnorm(4), b=(rnorm(4)))
ref

  z          a          b
1 a -0.3563105  1.4536406
2 b  1.6841862  1.3232985
3 c  1.3073516 -0.6978598
4 d  0.4352904 -0.3971175

The code I would like to run is (note: I am not calling library in the actual function, I am just doing it here for simplicity's sake):

library(dplyr)
valid_values <- ref %>%
  select(z) %>% 
  unname() %>% 
  unlist() %>% 
  as.character()

summary <- user %>% 
  mutate(x_valid=ifelse(x %in% valid_values, TRUE, FALSE))

summary tells me which values of x in user are valid:

   x          y x_valid
1  v -0.7025836   FALSE
2  p -1.4586245   FALSE
3  f  0.1987113   FALSE
4  y  1.0544690   FALSE
5  o -0.7112214   FALSE
6  m  0.2956671   FALSE
7  b  0.3016737    TRUE
8  a -0.0945271    TRUE
9  x -0.2790357   FALSE
10 c  0.1681388    TRUE

Now, what do I use to replace ref with in my function code? Where should I store this data in my package? How do I load it? And what type of file should I covert it to?

The function should look something like:

x_check <- function(data) {

  # get valid values
  valid_values <- ??? %>%
    select(z) %>% 
    unname() %>% 
    unlist() %>% 
    as.character()

  # compare against valid values
  return(
    data %>% 
    mutate(x_valid=ifelse(x %in% valid_values, TRUE, FALSE))
  )
}

What do I replace the ??? with to get my data? I do not care much whether or not the user is able to see this ref data I wish to load in.


I am using devtools::load_all("directory/for/my/package") to test my package. Relevant session information:

R version 3.4.0 (2017-04-21)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.3 (Maipo)

other attached packages:
[1] roxygen2_6.0.1             devtools_1.13.2
like image 431
Mark White Avatar asked Jul 11 '17 20:07

Mark White


2 Answers

I figured it out, just in case anyone comes across this in the future. How I accomplished this was just loading the data from the /data file in the local environment within the function:

x_check <- function(data) {

  # get reference data
  data("ref", envir=environment())

  # get valid values
  valid_values <- ref %>%
    select(z) %>% 
    unname() %>% 
    unlist() %>% 
    as.character()

  # compare against valid values
  return(
    data %>% 
    mutate(x_valid=ifelse(x %in% valid_values, TRUE, FALSE))
  )
}
like image 118
Mark White Avatar answered Oct 18 '22 01:10

Mark White


See Hadley Wickham's book on R writing packages where he explains how to store data in a package.

"The most common location for package data is (surprise!) data/. Each file in this directory should be a .RData file created by save() containing a single object (with the same name as the file)."

This will make your dataset accessible to any user of your package with packagename::data.

like image 33
Paul Rougieux Avatar answered Oct 18 '22 00:10

Paul Rougieux