bind_rows of different data types

Question

I'd like to stack a list of data.frames, but sometimes the columns have different data types. I'd like the operation to coerce to the lowest common denominator (which is usually character in my case).

This stacking occurs inside a package function that accepts almost any list of data.frames. It doesn't realistically have the ability to coerce ds_a$x to a character before bind_rows().

ds_a <- data.frame(
  x = 1:6,
  stringsAsFactors = FALSE
)
ds_b <- data.frame(
  x = c("z1", "z2"),
  stringsAsFactors = FALSE
)

# These four implementations throw:
# Error: Can not automatically convert from integer to character in column "x".
ds_1 <- dplyr::bind_rows(ds_a, ds_b)
ds_2 <- dplyr::bind_rows(ds_b, ds_a)
ds_3 <- dplyr::bind_rows(list(ds_a, ds_b))
ds_4 <- dplyr::union_all(ds_a, ds_b)

I'd like the output to be a data.frame with a single character vector:

I have some long-term plans to use meta-data from the (REDCap) database to influence the coercion, but I'm hoping there's a short-term general solution for the stacking operation.

akrun · Accepted Answer

We can use rbindlist from data.table

library(data.table)
rbindlist(list(ds_a, ds_b))
#    x
#1:  1
#2:  2
#3:  3
#4:  4
#5:  5
#6:  6
#7: z1
#8: z2

wibeasley · Answer

Recently I switched to an approach that keeps all columns as strings initially(when converting from plain-text to a data.frame), then stacks, and finally converts the columns to an appropriate data type after it has all the rows to make a decision (using readr::type_convert()).

It mimics this example. I haven't done any performance comparisons but there wasn't a noticeable difference (the internet is the real bottleneck). Also, I kinda like the idea of reducing the number of data type conversions.

library(magrittr)
col_types <- readr::cols(.default = readr::col_character())
raw_a <- "x,y
1,21
2,22
3,23
4,24
5,25
6,26"
raw_b <- "x,y
z1,31
z2,32"
ds_a <- readr::read_csv(raw_a, col_types=col_types)
ds_b <- readr::read_csv(raw_b, col_types=col_types)

list(ds_a, ds_b) %>% 
  dplyr::bind_rows() %>% 
  readr::type_convert()
#> Parsed with column specification:
#> cols(
#>   x = col_character(),
#>   y = col_double()
#> )
#> # A tibble: 8 x 2
#>   x         y
#>   <chr> <dbl>
#> 1 1        21
#> 2 2        22
#> 3 3        23
#> 4 4        24
#> 5 5        25
#> 6 6        26
#> 7 z1       31
#> 8 z2       32

^{Created on 2019-12-03 by the reprex package (v0.3.0)}

bind_rows of different data types

Tags:

r

dplyr

redcap

wibeasley

2 Answers

akrun

wibeasley

Recent Activity

Donate For Us

bind_rows of different data types

Tags:

r

dplyr

redcap

wibeasley

2 Answers

akrun

wibeasley

Related questions

Recent Activity

Donate For Us