Using "by-argument" in "outer" data.table to filter "inner" data.table

Question

I still have some problems understanding the data.table notation. Could anyone explain why the following is not working?

I'm trying to classify dates into groups using cut. The breaks used can be found in another data.table and depend on the by argument of the outer "data" data.table

data <- data.table(A = c(1, 1, 1, 2, 2, 2),
                   DATE = as.POSIXct(c("01-01-2012", "30-05-2015", "01-01-2020", "30-06-2012", "30-06-2013", "01-01-1999"), format = "%d-%m-%Y"))

breaks <- data.table(B = c(1, 1, 2, 2),
                     BREAKPOINT = as.POSIXct(c("01-01-2015", "01-01-2016", "30-06-2012", "30-06-2013"), format = "%d-%m-%Y"))

data[, bucket := cut(DATE, breaks[B == A, BREAKPOINT], ordered_result = T), by = A]

I can get the desired result doing

# expected
data[A == 1, bucket := cut(DATE, breaks[B == 1, BREAKPOINT], ordered_result = T)]
data[A == 2, bucket := cut(DATE, breaks[B == 2, BREAKPOINT], ordered_result = T)]
data 
#    A       DATE     bucket
# 1: 1 2012-01-01         NA
# 2: 1 2015-05-30 2015-01-01
# 3: 1 2020-01-01         NA
# 4: 2 2012-06-30 2012-06-30
# 5: 2 2013-06-30         NA
# 6: 2 1999-01-01         NA

Thanks, Michael

eddi · Accepted Answer

The problem is that cut produces factors and those are not being handled correctly in the data.table by operation (this is a bug and should be reported - the factor levels should be handled the same way they are handled in rbind.data.table or rbindlist). An easy fix to your original expression is to convert to character:

data[, bucket := as.character(cut(DATE, breaks[B == A, BREAKPOINT], ordered_result = T))
     , by = A]
#   A       DATE     bucket
#1: 1 2012-01-01         NA
#2: 1 2015-05-30 2015-01-01
#3: 1 2020-01-01         NA
#4: 2 2012-06-30 2012-06-30
#5: 2 2013-06-30         NA
#6: 2 1999-01-01         NA

Using "by-argument" in "outer" data.table to filter "inner" data.table

Tags:

r

data.table

Fabian Gehring

1 Answers

eddi

Recent Activity

Donate For Us

Using "by-argument" in "outer" data.table to filter "inner" data.table

Tags:

r

data.table

Fabian Gehring

1 Answers

eddi

Related questions

Recent Activity

Donate For Us