How do you load a dataset from an R package using the data()
function, and assign it directly to a variable without creating a duplicate copy in your environment?
Put simply, can you do this without creating two identical dfs in your environment:
> data("faithful") # Old Faithful Geyser Data from datasets package
> x <- faithful
> ls() # Now I have 2 identical dfs - x and faithful - in my environment
[1] "faithful" "x"
> remove(faithful) # Now I've removed one of the redundant dfs
Try 1:
My first approach was to just assign data("faithful")
to x
. But data()
returns a string. So now I have the df faithful
and the character vector x
in my environment.
> x <- data("faithful")
> x
[1] "faithful" # String, not the df "faithful" from the datasets package
> ls()
[1] "faithful" "x"
Try 2: Tried to get a little more sophisticated in my second attempt.
> x <- get(data("faithful")) # This works as far as assignment goes
> ls() # However I still get the duplicate copy
[1] "faithful" "x"
A short note about my motivation for trying to do this. I have an R package with 5 very large data.frames - each having the same columns. I want to efficiently generate the same calculated columns on all 5 data.frames. So I want to use data()
within a list()
constructor to get the 5 data.frames into a list. Then I want to use llply()
and mutate()
from the plyr
package to iterate over the dfs in the list and create the calculated columns for each df. But I don't want to have duplicate copies of the 5 large datasets sitting in my environment as this is within a Shiny App with a RAM limit.
edit: I was able to use both of @henfiber's methods from his answer to figure out how to lazy-load entire data.frames into a named list.
The first command here works for assigning a data.frame to a new variable name.
# this loads faithful into a variable x.
# Note we don't need to use the data() function to load faithful
> delayedAssign("x",faithful)
But I wanted to create a named list x
with elements y = data(faithful)
, z=data(iris)
, etc.
I tried the below and it didn't work.
> x <- list(delayedAssign("y",faithful),delayedAssign("z", iris))
> ls()
[1] "x" "y" "z" # x is a list with 2 nulls, y & z are promises to faithful & iris
But I finally was able to construct a list of lazy-loaded data.frame objects in the following manner:
# define this function provided by henfiber
getdata <- function(...)
{
e <- new.env()
name <- data(..., envir = e)[1]
e[[name]]
}
# now create your list, this gives you one object "x" of class list
# with elements "y" and "z" which are your data.frames
x <- list(y=getdata(faithful),z=getdata(iris))
In the R Commander, you can click the Data set button to select a data set, and then click the Edit data set button. For more advanced data manipulation in R Commander, explore the Data menu, particularly the Data / Active data set and Data / Manage variables in active data set menus.
The default R datasets included in the base R distribution Simply check the checkbox next to the package name to load the package and gain access to the datasets. You can also click on the package name and RStudio will open a help file describing the datasets in this package.
data() returns a list of currently loaded datasets or loads a dataset.
There are basically two extremely important functions when it comes down to R packages: install. packages() , which as you can expect, installs a given package. library() which loads packages, i.e. attaches them to the search list on your R workspace.
Using a helper function:
# define this function
getdata <- function(...)
{
e <- new.env()
name <- data(..., envir = e)[1]
e[[name]]
}
# now load your data calling getdata()
x <- getdata("faithful")
Or using an anonymous function:
x <- (function(...)get(data(...,envir = new.env())))("faithful")
You should also consider lazy loading
your data adding LazyData: true
in the DESCRIPTION file of your package.
If you use RStudio
, after running data("faithful")
, you'll see at the Environment
panel that the "faithful" data.frame is called "promise"
(another less common name is "thunk"
) and is greyed out. That means that it is lazily evaluated by R and not still loaded into memory. You can even lazy load the "x"
variable with the delayedAssign()
function:
data("faithful") # lazy load "faithful"
delayedAssign("x", faithful) # lazy assign "x" with a reference to "faithful"
rm(faithful) # remove "faithful"
Still nothing has been loaded into memory yet
summary(x) # now x has been loaded and evaluated
Learn more about lazy evaluation
here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With