Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get a list of the data sets in a particular package

Tags:

r

I would like to get a list of all the data sets in a particular R package shown in the console. I know that the function data() will list all the data sets in loaded packages. That's not my target. I want to get the list of all data sets in a particular R package. The following attempt is not working.

data() data('arules') # Warning message: # In data("arules") : data set ‘arules’ not found 

My other intention is to get a list of dim for all the data sets in a particular package.

like image 774
S Das Avatar asked Dec 30 '14 17:12

S Das


People also ask

How can I see the datasets in a package?

To view the data sets in a package we need to load the package and then use data(). In this way, we will find the list of the data sets available in a package at the bottom of the window that shows all the data sets in base R.

How do you get a dataset from a package in R?

If you look at the package listing in the Packages panel, you will find a package called datasets. Simply check the checkbox next to the package name to load the package and gain access to the datasets. You can also click on the package name and RStudio will open a help file describing the datasets in this package.

How do I see all datasets in R?

Here is how to locate the data set and load it into R. Command library loads the package MASS (for Modern Applied Statistics with S) into memory. Command data() will list all the datasets in loaded packages. The command data(phones) will load the data set phones into memory.


1 Answers

There's some good info on this in the details section of help(data). Here are the basics, using the plyr package as an example. For starters, let's see what's available from data().

names(data()) #[1] "title"   "header"  "results" "footer"  

Further investigation of those elements will reveal what's in them. Next, we can use the arguments in data() and then subset the resulting list to find the names of the data sets in the package.

d <- data(package = "plyr") ## names of data sets in the package d$results[, "Item"] # [1] "baseball" "ozone"    ## assign it to use later nm <- d$results[, "Item"] ## call the promised data data(list = nm, package = "plyr") ## get the dimensions of each data set lapply(mget(nm), dim) # $baseball # [1] 21699    22 # # $ozone # [1] 24 24 72 

Edit/Update: If you wish to find the names of data sets in all installed packages, you can use the following. .packages(TRUE) gives all packages available in the library location path lib.loc. Since the data sets in the base and stats packages have been moved to the datasets package, we need to account for that by taking them out with setdiff().

## names of all packages sans base and stats pkgs <- setdiff(.packages(TRUE), c("base", "stats")) ## get the names of all the data sets dsets <- data(package = pkgs)$result[, "Item"] ## look at the first few in our result head(dsets) # [1] "AirPassengers"          "BJsales"                "BJsales.lead (BJsales)" # [4] "BOD"                    "CO2"                    "ChickWeight"    
like image 192
Rich Scriven Avatar answered Sep 29 '22 11:09

Rich Scriven