I have two data.table
s that I want to join (form a Cartesian product of). One of the data.table
s is keyed on a Date
vector, and the other on a numeric
vector:
# data.table with dates (as numeric)
dtDates2 = data.table(date =
as.numeric(seq(from = as.Date('2014/01/01'),
to = as.Date('2014/07/01'), by = 'weeks')),
data1 = rnorm(26))
# data.table with dates
dtDates1 = data.table(date =
seq(from = as.Date('2014/01/01'),
to = as.Date('2014/07/01'), by = 'weeks'),
data1 = rnorm(26))
# data.table with customer IDs
dtCustomers = data.table(customerID = seq(1, 100),
data2 = rnorm(100))
I setkey
and try to cross-join them using CJ
:
# cross join the two datatables
setkey(dtCustomers, customerID)
setkey(dtDates1, date)
setkey(dtDates2, date)
CJ(dtCustomers, dtDates1)
CJ(dtCustomers, dtDates2)
but get the following error:
Error in FUN(X[[1L]], ...) :
Invalid column: it has dimensions. Can't format it. If it's the result of data.table(table()), use as.data.table(table()) instead.
Not sure what I am doing wrong.
There is no cross join functionality available in data.table
out of the box.
Yet there is CJ.dt
function (a CJ
like but designed for data.tables) to achieve cartesian product (cross join) available in optiRum
package (available in CRAN).
You can create the function:
CJ.dt = function(X,Y) {
stopifnot(is.data.table(X),is.data.table(Y))
k = NULL
X = X[, c(k=1, .SD)]
setkey(X, k)
Y = Y[, c(k=1, .SD)]
setkey(Y, NULL)
X[Y, allow.cartesian=TRUE][, k := NULL][]
}
CJ.dt(dtCustomers, dtDates1)
CJ.dt(dtCustomers, dtDates2)
Yet there is a FR for convenience way to perform cross join filled in data.table#1717, so you could check there if there is a nicer api for cross join.
thank you jangorecki for the very useful function
I had to add support for empty X and/or Y:
CJ.dt = function(X, Y) {
stopifnot(is.data.table(X), is.data.table(Y))
if(nrow(X) > 0 & nrow(Y) > 0){
k = NULL
X = X[, c(k = 1, .SD)]
setkey(X, k)
Y = Y[, c(k = 1, .SD)]
setkey(Y, NULL)
return(X[Y, allow.cartesian = T][, k := NULL][])
} else {
duplicatedNames <- names(Y)[names(Y) %in% names(X)]
if(length(duplicatedNames) > 0) {
setnames(Y, duplicatedNames, paste0("i.", duplicatedNames))
}
setkey(Y)
setkey(X)
return(cbind(X[!X], Y[!Y]))
}
}
# X <- data.table(a = c(1, 2))
# Y <- data.table(a = c(2, 3), b = c(4, 5))
#
# CJ.dt(X, Y)
# CJ.dt(X[a > 2], Y)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With