Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add metadata to a tibble

How does one add metadata to a tibble?

I would like a sentence describing each of my variable names such that I could print out the tibble with the associated metadata and if I handed it to someone who hadn't seen the data before, they could make some sense of it.

as_tibble(iris)

# A tibble: 150 × 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
# ... with 140 more rows

# Sepal.length. Measured from sepal attachment to stem
# Sepal.width. Measured at the widest point
# Petal.length. Measured from petal attachment to stem
# Petal.width. Measured at widest point
# Species. Nomenclature based on Integrated Taxonomic Information System (ITIS), January 2018.

thanks!

like image 312
Adrienne B Avatar asked Jan 08 '18 19:01

Adrienne B


People also ask

What is a metadata in R?

Metadata is data about data. This refers to not the data itself, but rather to any information that describes some aspect of the data.

What is the difference between a Tibble and a Dataframe in R?

Tibbles vs data frames There are two main differences in the usage of a data frame vs a tibble: printing, and subsetting. Tibbles have a refined print method that shows only the first 10 rows, and all the columns that fit on screen. This makes it much easier to work with large data.

Is a Tibble a data frame?

Tibbles are data. frames that are lazy and surly: they do less (i.e. they don't change variable names or types, and don't do partial matching) and complain more (e.g. when a variable does not exist). This forces you to confront problems earlier, typically leading to cleaner, more expressive code.

What does Tribble do in R?

A tribble () is used for creating a row-wise, readable tibble in R. This is useful for creating small tables of data. **Syntax: tribble (~column1, ~column2)** where, Row column — represents the data in row by row layout This recipe demonstrates an example on tribble in R.


2 Answers

Sorry for the delayed response. But this topic has been bugging me since I first started learning R. In my work, assigning metadata to columns is not just common. It is required. That R didn't seem to have a nice way to do it was really bothering me. So much so, that I wrote some packages to do it.

The fmtr package has a function to assign the descriptions (plus other stuff). And the libr package has a dictionary function, so you can look at all the metadata you assign.

Here is how it works:

First, assign the descriptions to the columns. You just send a named list into to the descriptions() function.

library(fmtr)
library(libr)

# Create data frame
df <- iris

# Assign descriptions
descriptions(df) <- list(Sepal.Length = "Measured from sepal attachment to stem", 
                         Sepal.Width = "Measured at the widest point",
                         Petal.Length = "Measured from petal attachment to stem", 
                         Petal.Width = "Measured at the widest point",
                         Species = paste("Nomanclature based on Integrated Taxonomic", 
                                         "Information System (ITIS), January 2018."))


Then you can see all the metadata by calling the dictionary() function, like so:

dictionary(df)
# # A tibble: 5 x 10
#  Name  Column      Class  Label Description                                                 
#  <chr> <chr>       <chr>  <chr> <chr>                                                      
# 1 df    Sepal.Leng~ numer~ NA    Measured from sepal attachment to stem                     
# 2 df    Sepal.Width numer~ NA    Measured at the widest point                                
# 3 df    Petal.Leng~ numer~ NA    Measured from petal attachment to stem                      
# 4 df    Petal.Width numer~ NA    Measured at the widest point                                 
# 5 df    Species     factor NA    Nomanclature based on Integrated Taxonomic Information Syst~

If you like, you can return the dictionary as its own data frame, then save it or print it or whatever.

d <- dictionary(df)

Here is the dictionary data frame:

dictionary data frame

like image 132
David J. Bosak Avatar answered Sep 29 '22 11:09

David J. Bosak


This seems tricky. In principle @hrbrmstr's comment is the way to go (i.e. use ?comment or ?attr to add attributes to any object), but these attributes will not be printed out by default. Attributes seem to be printed automatically for atomic objects:

> z <- 1:6
> attr(z,"hello") <- "goodbye"
> z
[1] 1 2 3 4 5 6
attr(,"hello")
[1] "goodbye"

... but not, alas, for data frames or tibbles:

dd <- tibble::tibble(x=1:4,y=2:5)
> attr(dd,"metadata") <- c("some stuff","some more stuff")
> dd
# A tibble: 4 x 2
      x     y
  <int> <int>
1     1     2
2     2     3
3     3     4
4     4     5

You can wrap the object with its own S3 class to get this stuff printed:

class(dd) <- c("my_tbl",class(dd))
> print.my_tbl <- function(x) {
+    NextMethod(x)
+    print(attr(x,"metadata"))
+    invisible(x)
+ }
> dd
# A tibble: 4 x 2
      x     y
  <int> <int>
1     1     2
2     2     3
3     3     4
4     4     5
[1] "some stuff"      "some more stuff"

You could make the printing more elaborate or pretty, e.g.

cat("\nMETADATA:\n")
cat(sprintf("# %s",attr(x,"metadata")),sep="\n")

Nothing bad will happen if the other user hasn't defined print.my_tbl (the print method will fall back to the print method for tibbles), but the metadata will only be printed if they have your print.my_tbl definition ...

like image 30
Ben Bolker Avatar answered Sep 29 '22 10:09

Ben Bolker