Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bind rows of data frames with some factor columns

I want to create a suped-up version of dplyr::bind_rows that avoids the Unequal factor levels: coercing to character warnings when factor columns are present in the dfs we're trying to combine (which may also have non-factor columns). Here's an example:

df1 <- dplyr::data_frame(age = 1:3, gender = factor(c("male", "female", "female")), district = factor(c("north", "south", "west")))
df2 <- dplyr::data_frame(age = 4:6, gender = factor(c("male", "neutral", "neutral")), district = factor(c("central", "north", "east")))

then bind_rows_with_factor_columns(df1, df2) returns (without warnings):

dplyr::data_frame(
  age = 1:6,
  gender = factor(c("male", "female", "female", "male", "neutral", "neutral")),
  district = factor(c("north", "south", "west", "central", "north", "east"))
)

Here's what I have so far:

bind_rows_with_factor_columns <- function(...) {
  factor_columns <- purrr::map(..., function(df) {
      colnames(dplyr::select_if(df, is.factor))
  })

  if (length(unique(factor_columns)) > 1) {
      stop("All factor columns in dfs must have the same column names")
  }

  df_list <- purrr::map(..., function (df) {
    purrr::map_if(df, is.factor, as.character) %>% dplyr::as_data_frame()
  })

  dplyr::bind_rows(df_list) %>%
    purrr::map_at(factor_columns[[1]], as.factor) %>%
    dplyr::as_data_frame()
}

I'm wondering if anyone has any ideas on how to incorporate the forcats package to potentially avoid having to coerce factors to characters, or if anyone has any suggestions in general to boost the performance of this while maintaining the same functionality (I'd like to stick to tidyverse syntax). Thanks!

like image 655
Nick Resnick Avatar asked Oct 30 '22 11:10

Nick Resnick


1 Answers

Going to answer my own question based on a great solution from a friend:

bind_rows_with_factor_columns <- function(...) {
  purrr::pmap_df(list(...), function(...) {
    cols_to_bind <- list(...)
    if (all(purrr::map_lgl(cols_to_bind, is.factor))) {
      forcats::fct_c(cols_to_bind)
    } else {
      unlist(cols_to_bind)
    }
  })
}
like image 92
Nick Resnick Avatar answered Nov 11 '22 18:11

Nick Resnick