Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Can I Quickly Inspect Built-in Data Sets (PSA)?

Tags:

r

dataset

One of the best ways to make a question reproducible is to use one of the built in data sets. Using data(), however, is frustrating because no information about the structure of the data set is provided.

How can I quickly view the structure of available data sets?

like image 323
BrodieG Avatar asked Mar 04 '15 19:03

BrodieG


1 Answers

The following function may help:

dataStr <- function(fun=function(x) TRUE)
  str(
    Filter(
      fun,
      Filter(
        Negate(is.null),
        mget(data()$results[, "Item"], inh=T, ifn=list(NULL))
  ) ) )

It accepts a filtering function, applies it to all the data sets, and prints out the structure of the matching data sets. For example, if we're looking for matrices:

> dataStr(is.matrix)
List of 8
 $ WorldPhones          : num [1:7, 1:7] 45939 60423 64721 68484 71799 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:7] "1951" "1956" "1957" "1958" ...
  .. ..$ : chr [1:7] "N.Amer" "Europe" "Asia" "S.Amer" ...
 $ occupationalStatus   : 'table' int [1:8, 1:8] 50 16 12 11 2 12 0 0 19 40 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ origin     : chr [1:8] "1" "2" "3" "4" ...
  .. ..$ destination: chr [1:8] "1" "2" "3" "4" ...
 $ volcano              : num [1:87, 1:61] 100 101 102 103 104 105 105 106 107 108 ...
--- 5 entries omitted ---

Or for data frames (also omitting entries):

> dataStr(is.data.frame)
List of 42
 $ BOD             :'data.frame': 6 obs. of  2 variables:
  ..$ Time  : num [1:6] 1 2 3 4 5 7
  ..$ demand: num [1:6] 8.3 10.3 19 16 15.6 19.8
  ..- attr(*, "reference")= chr "A1.4, p. 270"
 $ CO2             :Classes ‘nfnGroupedData’, ‘nfGroupedData’, ‘groupedData’ and 'data.frame':  84 obs. of  5 variables:
  ..$ Plant    : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
  ..$ Type     : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
  ..$ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
  ..$ conc     : num [1:84] 95 175 250 350 500 675 1000 95 175 250 ...
  ..$ uptake   : num [1:84] 16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...
--- 40 entries omitted ---

Or even for simple vectors:

> dataStr(function(x) is.atomic(x) && is.vector(x) && !is.ts(x))
List of 4
 $ euro   : Named num [1:11] 13.76 40.34 1.96 166.39 5.95 ...
  ..- attr(*, "names")= chr [1:11] "ATS" "BEF" "DEM" "ESP" ...
 $ islands: Named num [1:48] 11506 5500 16988 2968 16 ...
  ..- attr(*, "names")= chr [1:48] "Africa" "Antarctica" "Asia" "Australia" ...
 $ precip : Named num [1:70] 67 54.7 7 48.5 14 17.2 20.7 13 43.4 40.2 ...
  ..- attr(*, "names")= chr [1:70] "Mobile" "Juneau" "Phoenix" "Little Rock" ...
 $ rivers : num [1:141] 735 320 325 392 524 ...
like image 55
BrodieG Avatar answered Nov 05 '22 01:11

BrodieG