I am currently writing a function for an R package. Part of what this function is aimed to do is (a) take data as an input and (b) check one of its columns against a list of acceptable values.
These acceptable values are given to me from another organization. They are within a .csv file. What I would like to do is load this .csv file and use it as a reference to check if the column from the user has valid values.
For example, let's say the user has these data:
set.seed(1839)
user <- data.frame(x=sample(letters,10),
y=rnorm(10))
user
x y
1 v -0.7025836
2 p -1.4586245
3 f 0.1987113
4 y 1.0544690
5 o -0.7112214
6 m 0.2956671
7 b 0.3016737
8 a -0.0945271
9 x -0.2790357
10 c 0.1681388
And the .csv contains many (useful) columns, but I only care about one (z
) for the moment:
ref <- data.frame(z=letters[1:4], a=rnorm(4), b=(rnorm(4)))
ref
z a b
1 a -0.3563105 1.4536406
2 b 1.6841862 1.3232985
3 c 1.3073516 -0.6978598
4 d 0.4352904 -0.3971175
The code I would like to run is (note: I am not calling library
in the actual function, I am just doing it here for simplicity's sake):
library(dplyr)
valid_values <- ref %>%
select(z) %>%
unname() %>%
unlist() %>%
as.character()
summary <- user %>%
mutate(x_valid=ifelse(x %in% valid_values, TRUE, FALSE))
summary
tells me which values of x
in user
are valid:
x y x_valid
1 v -0.7025836 FALSE
2 p -1.4586245 FALSE
3 f 0.1987113 FALSE
4 y 1.0544690 FALSE
5 o -0.7112214 FALSE
6 m 0.2956671 FALSE
7 b 0.3016737 TRUE
8 a -0.0945271 TRUE
9 x -0.2790357 FALSE
10 c 0.1681388 TRUE
Now, what do I use to replace ref
with in my function code? Where should I store this data in my package? How do I load it? And what type of file should I covert it to?
The function should look something like:
x_check <- function(data) {
# get valid values
valid_values <- ??? %>%
select(z) %>%
unname() %>%
unlist() %>%
as.character()
# compare against valid values
return(
data %>%
mutate(x_valid=ifelse(x %in% valid_values, TRUE, FALSE))
)
}
What do I replace the ???
with to get my data? I do not care much whether or not the user is able to see this ref
data I wish to load in.
I am using devtools::load_all("directory/for/my/package")
to test my package. Relevant session information:
R version 3.4.0 (2017-04-21)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.3 (Maipo)
other attached packages:
[1] roxygen2_6.0.1 devtools_1.13.2
I figured it out, just in case anyone comes across this in the future. How I accomplished this was just loading the data from the /data
file in the local environment within the function:
x_check <- function(data) {
# get reference data
data("ref", envir=environment())
# get valid values
valid_values <- ref %>%
select(z) %>%
unname() %>%
unlist() %>%
as.character()
# compare against valid values
return(
data %>%
mutate(x_valid=ifelse(x %in% valid_values, TRUE, FALSE))
)
}
See Hadley Wickham's book on R writing packages where he explains how to store data in a package.
"The most common location for package data is (surprise!) data/. Each file in this directory should be a .RData file created by save() containing a single object (with the same name as the file)."
This will make your dataset accessible to any user of your package with packagename::data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With