I'd like to stack a list of data.frames, but sometimes the columns have different data types. I'd like the operation to coerce to the lowest common denominator (which is usually character
in my case).
This stacking occurs inside a package function that accepts almost any list of data.frames. It doesn't realistically have the ability to coerce ds_a$x
to a character before bind_rows()
.
ds_a <- data.frame(
x = 1:6,
stringsAsFactors = FALSE
)
ds_b <- data.frame(
x = c("z1", "z2"),
stringsAsFactors = FALSE
)
# These four implementations throw:
# Error: Can not automatically convert from integer to character in column "x".
ds_1 <- dplyr::bind_rows(ds_a, ds_b)
ds_2 <- dplyr::bind_rows(ds_b, ds_a)
ds_3 <- dplyr::bind_rows(list(ds_a, ds_b))
ds_4 <- dplyr::union_all(ds_a, ds_b)
I'd like the output to be a data.frame with a single character vector:
x
1 1
2 2
3 3
4 4
5 5
6 6
7 z1
8 z2
I have some long-term plans to use meta-data from the (REDCap) database to influence the coercion, but I'm hoping there's a short-term general solution for the stacking operation.
We can use rbindlist
from data.table
library(data.table)
rbindlist(list(ds_a, ds_b))
# x
#1: 1
#2: 2
#3: 3
#4: 4
#5: 5
#6: 6
#7: z1
#8: z2
Recently I switched to an approach that keeps all columns as strings initially(when converting from plain-text to a data.frame), then stacks, and finally converts the columns to an appropriate data type after it has all the rows to make a decision (using readr::type_convert()
).
It mimics this example. I haven't done any performance comparisons but there wasn't a noticeable difference (the internet is the real bottleneck). Also, I kinda like the idea of reducing the number of data type conversions.
library(magrittr)
col_types <- readr::cols(.default = readr::col_character())
raw_a <- "x,y\n1,21\n2,22\n3,23\n4,24\n5,25\n6,26"
raw_b <- "x,y\nz1,31\nz2,32"
ds_a <- readr::read_csv(raw_a, col_types=col_types)
ds_b <- readr::read_csv(raw_b, col_types=col_types)
list(ds_a, ds_b) %>%
dplyr::bind_rows() %>%
readr::type_convert()
#> Parsed with column specification:
#> cols(
#> x = col_character(),
#> y = col_double()
#> )
#> # A tibble: 8 x 2
#> x y
#> <chr> <dbl>
#> 1 1 21
#> 2 2 22
#> 3 3 23
#> 4 4 24
#> 5 5 25
#> 6 6 26
#> 7 z1 31
#> 8 z2 32
Created on 2019-12-03 by the reprex package (v0.3.0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With