Consider this
do.call(rbind, list(data.table(x=1, b='x'),data.table(x=1, b=NA)))
returns
x b
1: 1 x
2: 1 NA
but
do.call(rbind, list(data.table(x=1, b=NA),data.table(x=1, b='x')))
returns
x b
1: 1 NA
2: 1 NA
How can i force the first behavior, without reordering the contents of the list?
Data table is really really faster in mapreduce jobs (calling data.table ~10*3MM times across 55 nodes, the data table is many many times faster than data frame, so i want this to work ...) Regards saptarshi
As noted by Frank, the problem is that there are (somewhat invisibly) several different types of NA
. The one produced when you type NA
at the command line is of class "logical"
, but there are also NA_integer_
, NA_real_
, NA_character_
, and NA_complex_
.
In your first example, the initial data.table
sets the class of column b
to "character", and the NA
in the second data.table
is then coerced to an NA_character_
. In the second example, though, the NA
in the first data.table
sets column b
's class to "logical", and, when the same column in the second data.table is coerced to "logical", it's converted to a logical NA. (Try as.logical("x")
to see why.)
That's all fairly complicated (to articulate, at least), but there is a reasonably simple solution. Just create a 1-row template data.table
, and prepend it to each list of data.table
's you want to rbind()
. It will establish the class of each column to be what you want, regardless of what data.table
's follow it in the list passed to rbind()
, and can be trimmed off once everything else is bound together.
library(data.table)
## The two lists of data.tables from the OP
A <- list(data.table(x=1, b='x'),data.table(x=1, b=NA))
B <- list(data.table(x=1, b=NA),data.table(x=1, b='x'))
## A 1-row template, used to set the column types (and then removed)
DT <- data.table(x=numeric(1), b=character(1))
## Test it out
do.call(rbind, c(list(DT), A))[-1,]
# x b
# 1: 1 x
# 2: 1 NA
do.call(rbind, c(list(DT), B))[-1,]
# x b
# 1: 1 NA
# 2: 1 x
## Finally, as _also_ noted by Frank, rbindlist will likely be more efficient
rbindlist(c(list(DT), B)[-1,]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With