haven::read_dta supports importing variable label from Stata into R using the label attribute. Rstudio also supports displaying these labels in the View pane.
However, when two data frames are bound using dplyr::bind_rows (or rbind_all), the labels are not preserved. Is this a bug?
library(dplyr)
id <- 1:5
attr(id, "label") <- "unit id"
df1 <- tbl_df(data.frame(id)) # label is fine
df1$id
# [1] 1 2 3 4 5
# attr(,"label")
# [1] "unit id"
df2 <- tbl_df(data.frame(id)) # label is fine
df2$id
# [1] 1 2 3 4 5
# attr(,"label")
# [1] "unit id"
df_bound <- bind_rows(df1, df2) # label is gone
df_bound$id
# [1] 1 2 3 4 5 1 2 3 4 5
                A workaround is to use rbind instead of bind_rows. You must then make sure that the column names are equal.
Use setdiff(names(df1), names(df2)) to get column names that are in df1 but not in df2, and setdiff(names(df2), names(df1)) vice versa.
The sjlabelled package by Daniel Lüdecke is a nice solution for problems like this when working with labelled data. I used the copy_labels function for a similar issue :
library(dplyr)  
library(sjlabelled) 
id <- 1:5  
attr(id, "label") <- "unit id"  
df1 <- tbl_df(data.frame(id))  
str(df1)   
# tibble [5 × 1] (S3: tbl_df/tbl/data.frame)  
# $ id: int [1:5] 1 2 3 4 5  
# ..- attr(*, "label")= chr "unit id"  
df2 <- tbl_df(data.frame(id)) # label is fine  
df_bound <- bind_rows(df1, df2) # label is gone  
str(df_bound)  
# tibble [10 × 1] (S3: tbl_df/tbl/data.frame)  
#  $ id: int [1:10] 1 2 3 4 5 1 2 3 4 5   
df_bound <- copy_labels(df_bound, df1)  
df_bound_labelled <- df_bound %>% mutate_at(vars(id), as_labelled)
str(df_bound_labelled)  
# tibble [10 × 1] (S3: tbl_df/tbl/data.frame)  
# $ id: int [1:10] 1 2 3 4 5 1 2 3 4 5  
#  ..- attr(*, "label")= chr "unit id"  
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With