I want to create a suped-up version of dplyr::bind_rows that avoids the Unequal factor levels: coercing to character warnings when factor columns are present in the dfs we're trying to combine (which may also have non-factor columns). Here's an example:
df1 <- dplyr::data_frame(age = 1:3, gender = factor(c("male", "female", "female")), district = factor(c("north", "south", "west")))
df2 <- dplyr::data_frame(age = 4:6, gender = factor(c("male", "neutral", "neutral")), district = factor(c("central", "north", "east")))
then bind_rows_with_factor_columns(df1, df2) returns (without warnings):
dplyr::data_frame(
age = 1:6,
gender = factor(c("male", "female", "female", "male", "neutral", "neutral")),
district = factor(c("north", "south", "west", "central", "north", "east"))
)
Here's what I have so far:
bind_rows_with_factor_columns <- function(...) {
factor_columns <- purrr::map(..., function(df) {
colnames(dplyr::select_if(df, is.factor))
})
if (length(unique(factor_columns)) > 1) {
stop("All factor columns in dfs must have the same column names")
}
df_list <- purrr::map(..., function (df) {
purrr::map_if(df, is.factor, as.character) %>% dplyr::as_data_frame()
})
dplyr::bind_rows(df_list) %>%
purrr::map_at(factor_columns[[1]], as.factor) %>%
dplyr::as_data_frame()
}
I'm wondering if anyone has any ideas on how to incorporate the forcats package to potentially avoid having to coerce factors to characters, or if anyone has any suggestions in general to boost the performance of this while maintaining the same functionality (I'd like to stick to tidyverse syntax). Thanks!
Going to answer my own question based on a great solution from a friend:
bind_rows_with_factor_columns <- function(...) {
purrr::pmap_df(list(...), function(...) {
cols_to_bind <- list(...)
if (all(purrr::map_lgl(cols_to_bind, is.factor))) {
forcats::fct_c(cols_to_bind)
} else {
unlist(cols_to_bind)
}
})
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With