haven::read_dta
supports importing variable label from Stata into R using the label
attribute. Rstudio also supports displaying these labels in the View pane.
However, when two data frames are bound using dplyr::bind_rows
(or rbind_all
), the labels are not preserved. Is this a bug?
library(dplyr)
id <- 1:5
attr(id, "label") <- "unit id"
df1 <- tbl_df(data.frame(id)) # label is fine
df1$id
# [1] 1 2 3 4 5
# attr(,"label")
# [1] "unit id"
df2 <- tbl_df(data.frame(id)) # label is fine
df2$id
# [1] 1 2 3 4 5
# attr(,"label")
# [1] "unit id"
df_bound <- bind_rows(df1, df2) # label is gone
df_bound$id
# [1] 1 2 3 4 5 1 2 3 4 5
A workaround is to use rbind
instead of bind_rows
. You must then make sure that the column names are equal.
Use setdiff(names(df1), names(df2))
to get column names that are in df1
but not in df2
, and setdiff(names(df2), names(df1))
vice versa.
The sjlabelled
package by Daniel Lüdecke is a nice solution for problems like this when working with labelled data. I used the copy_labels
function for a similar issue :
library(dplyr)
library(sjlabelled)
id <- 1:5
attr(id, "label") <- "unit id"
df1 <- tbl_df(data.frame(id))
str(df1)
# tibble [5 × 1] (S3: tbl_df/tbl/data.frame)
# $ id: int [1:5] 1 2 3 4 5
# ..- attr(*, "label")= chr "unit id"
df2 <- tbl_df(data.frame(id)) # label is fine
df_bound <- bind_rows(df1, df2) # label is gone
str(df_bound)
# tibble [10 × 1] (S3: tbl_df/tbl/data.frame)
# $ id: int [1:10] 1 2 3 4 5 1 2 3 4 5
df_bound <- copy_labels(df_bound, df1)
df_bound_labelled <- df_bound %>% mutate_at(vars(id), as_labelled)
str(df_bound_labelled)
# tibble [10 × 1] (S3: tbl_df/tbl/data.frame)
# $ id: int [1:10] 1 2 3 4 5 1 2 3 4 5
# ..- attr(*, "label")= chr "unit id"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With