I am struggling with variable labels of data.frame columns. Say I have the following data frame (part of much larger data frame):
data <- data.frame(age = c(21, 30, 25, 41, 29, 33), sex = factor(c(1, 2, 1, 2, 1, 2), labels = c("Female", "Male"))) #
I also have a named vector with the variable labels for this data frame:
var.labels <- c(age = "Age in Years", sex = "Sex of the participant")
I want to assign the variable labels in var.labels
to the columns in the data frame data
using the function label
from the Hmisc
package. I can do them one by one like this and check the result afterwards:
> label(data[["age"]]) <- "Age in years" > label(data[["sex"]]) <- "Sex of the participant" > label(data) age sex "Age in years" "Sex of the participant"
The variable labels are assigned as attributes of the columns:
> attr(data[["age"]], "label") [1] "Age in years" > attr(data[["sex"]], "label") [1] "Sex of the participant"
Wonderful. However, with a larger data frame, say 100 or more columns, this will not be convenient or efficient. Another option is to assign them as attributes directly:
> attr(data, "variable.labels") <- var.labels
Does not help. The variable labels are not assigned to the columns:
> label(data) age sex "" ""
Instead, they are assigned as an attribute of the data frame itself (see the last component of the list):
> attributes(data) $names [1] "age" "sex" $row.names [1] 1 2 3 4 5 6 $class [1] "data.frame" $variable.labels age sex "Age in Years" "Sex of the participant"
And this is not what I want. I need the variable labels as attributes of the columns. I tried to write the following function (and many others):
set.var.labels <- function(dataframe, label.vector){ column.names <- names(dataframe) dataframe <- mapply(label, column.names, label.vector) return(dataframe) }
And then execute it:
> set.var.labels(data, var.labels)
Did not help. It returns the values of the vector var.labels
but does not assign the variable labels. If I try to assign it to a new object, it just contains the values of the variable labels as a vector.
To understand value labels in R, you need to understand the data structure factor. You can use the factor function to create your own value labels. Use the factor() function for nominal data and the ordered() function for ordinal data. R statistical and graphic functions will then treat the data appriopriately.
Variable label is human readable description of the variable. R supports rather long variable names and these names can contain even spaces and punctuation but short variables names make coding easier. Variable label can give a nice, long description of variable.
You can do this by creating a list from the named vector of var.labels
and assigning that to the label
values. I've used match
to ensure that values of var.labels
are assigned to their corresponding column in data
even if the order of var.labels
is different from the order of the data
columns.
library(Hmisc) var.labels = c(age="Age in Years", sex="Sex of the participant") label(data) = as.list(var.labels[match(names(data), names(var.labels))]) label(data) age sex "Age in Years" "Sex of the participant"
Original Answer
My original answer used lapply
, which isn't actually necessary. Here's the original answer for archival purposes:
You can assign the labels using lapply
:
label(data) = lapply(names(data), function(x) var.labels[match(x, names(var.labels))])
lapply
applies a function to each element of a list or vector. In this case the function is applied to each value of names(data)
and it picks out the label value from var.labels
that corresponds to the current value of names(data)
.
Reading through a few tutorials is a good way to get the general idea, but you'll really get the hang of it if you start using lapply
in different situations and see how it behaves.
I highly recommend to use the Hmisc::upData()
function.
Here a reprex example:
set.seed(22) data <- data.frame(age = floor(rnorm(6,25,10)), sex = gl(2,1,6, labels = c("f","m"))) var.labels <- c(age = "Age in Years", sex = "Sex of the participant") dplyr::as.tbl(data) # as tibble --------------------------------------------- #> # A tibble: 6 × 2 #> age sex #> <dbl> <fctr> #> 1 19 f #> 2 49 m #> 3 35 f #> 4 27 m #> 5 22 f #> 6 43 m data <- Hmisc::upData(data, labels = var.labels) # update data -------------- #> Input object size: 1328 bytes; 2 variables 6 observations #> New object size: 2096 bytes; 2 variables 6 observations Hmisc::label(data) # check new labels --------------------------------------- #> age sex #> "Age in Years" "Sex of the participant" Hmisc::contents(data) # data dictionary ------------------------------------- #> #> Data frame:data 6 observations and 2 variables Maximum # NAs:0 #> #> #> Labels Levels Class Storage #> age Age in Years integer integer #> sex Sex of the participant 2 integer #> #> +--------+------+ #> |Variable|Levels| #> +--------+------+ #> | sex | f,m | #> +--------+------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With