I want to create a suped-up version of dplyr::bind_rows
that avoids the Unequal factor levels: coercing to character
warnings when factor columns are present in the dfs we're trying to combine (which may also have non-factor columns). Here's an example:
df1 <- dplyr::data_frame(age = 1:3, gender = factor(c("male", "female", "female")), district = factor(c("north", "south", "west")))
df2 <- dplyr::data_frame(age = 4:6, gender = factor(c("male", "neutral", "neutral")), district = factor(c("central", "north", "east")))
then bind_rows_with_factor_columns(df1, df2)
returns (without warnings):
dplyr::data_frame(
age = 1:6,
gender = factor(c("male", "female", "female", "male", "neutral", "neutral")),
district = factor(c("north", "south", "west", "central", "north", "east"))
)
Here's what I have so far:
bind_rows_with_factor_columns <- function(...) {
factor_columns <- purrr::map(..., function(df) {
colnames(dplyr::select_if(df, is.factor))
})
if (length(unique(factor_columns)) > 1) {
stop("All factor columns in dfs must have the same column names")
}
df_list <- purrr::map(..., function (df) {
purrr::map_if(df, is.factor, as.character) %>% dplyr::as_data_frame()
})
dplyr::bind_rows(df_list) %>%
purrr::map_at(factor_columns[[1]], as.factor) %>%
dplyr::as_data_frame()
}
I'm wondering if anyone has any ideas on how to incorporate the forcats
package to potentially avoid having to coerce factors to characters, or if anyone has any suggestions in general to boost the performance of this while maintaining the same functionality (I'd like to stick to tidyverse
syntax). Thanks!
Going to answer my own question based on a great solution from a friend:
bind_rows_with_factor_columns <- function(...) {
purrr::pmap_df(list(...), function(...) {
cols_to_bind <- list(...)
if (all(purrr::map_lgl(cols_to_bind, is.factor))) {
forcats::fct_c(cols_to_bind)
} else {
unlist(cols_to_bind)
}
})
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With