Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table avoid recycling

I'm constructing a data.table from two (or more) input vectors with different lengths:

x <- c(1,2,3,4)
y <- c(8,9)

dt <- data.table(x = x, y = y)

And need the shorter vector(s) to be filled with NA rather than recycling their values, resulting in a data.table like this:

   x  y
1: 1  8
2: 2  9
3: 3 NA
4: 4 NA

Is there a way to achieve this without explicitly filling the shorter vector(s) with NA before passing them to the data.table() constructor?

Thanks!

like image 643
Steffen J. Avatar asked Mar 18 '18 09:03

Steffen J.


3 Answers

One can use out of range indices:

library("data.table")

x <- c(1,2,3,4)
y <- c(8,9)
n <- max(length(x), length(y))

dt <- data.table(x = x[1:n], y = y[1:n])
# > dt
#    x  y
# 1: 1  8
# 2: 2  9
# 3: 3 NA
# 4: 4 NA

Or you can extend y by doing (as @Roland recommended in the comment):

length(y) <- length(x) <- max(length(x), length(y))
dt <- data.table(x, y)
like image 98
jogo Avatar answered Sep 20 '22 17:09

jogo


An option is cbind.fill from rowr

library(rowr)
setNames(cbind.fill(x, y, fill = NA), c("x", "y"))

Or place the vectors in a list and then pad NA at the end based on the maximum length of the list elements

library(data.table)
lst <- list(x = x, y = y)
as.data.table(lapply(lst, `length<-`, max(lengths(lst))))
#   x  y
#1: 1  8
#2: 2  9
#3: 3 NA
#4: 4 NA
like image 33
akrun Avatar answered Sep 17 '22 17:09

akrun


The "out of range indices" answer provided by jogo can be extended cleanly to in-place assignment using .N:

x <- c(1,2,3,4)
y <- c(8,9)
n <- max(length(x), length(y))
dt <- data.table(x = x[1:n], y = y[1:n])

z <- c(6,7)
dt[, z := z[1:.N]]
#    x  y  z
# 1: 1  8  6
# 2: 2  9  7
# 3: 3 NA NA
# 4: 4 NA NA
like image 21
arau Avatar answered Sep 18 '22 17:09

arau