Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using "by-argument" in "outer" data.table to filter "inner" data.table

Tags:

r

data.table

I still have some problems understanding the data.table notation. Could anyone explain why the following is not working?

I'm trying to classify dates into groups using cut. The breaks used can be found in another data.table and depend on the by argument of the outer "data" data.table

data <- data.table(A = c(1, 1, 1, 2, 2, 2),
                   DATE = as.POSIXct(c("01-01-2012", "30-05-2015", "01-01-2020", "30-06-2012", "30-06-2013", "01-01-1999"), format = "%d-%m-%Y"))

breaks <- data.table(B = c(1, 1, 2, 2),
                     BREAKPOINT = as.POSIXct(c("01-01-2015", "01-01-2016", "30-06-2012", "30-06-2013"), format = "%d-%m-%Y"))

data[, bucket := cut(DATE, breaks[B == A, BREAKPOINT], ordered_result = T), by = A]

I can get the desired result doing

# expected
data[A == 1, bucket := cut(DATE, breaks[B == 1, BREAKPOINT], ordered_result = T)]
data[A == 2, bucket := cut(DATE, breaks[B == 2, BREAKPOINT], ordered_result = T)]
data 
#    A       DATE     bucket
# 1: 1 2012-01-01         NA
# 2: 1 2015-05-30 2015-01-01
# 3: 1 2020-01-01         NA
# 4: 2 2012-06-30 2012-06-30
# 5: 2 2013-06-30         NA
# 6: 2 1999-01-01         NA

Thanks, Michael

like image 240
Fabian Gehring Avatar asked Mar 26 '15 15:03

Fabian Gehring


1 Answers

The problem is that cut produces factors and those are not being handled correctly in the data.table by operation (this is a bug and should be reported - the factor levels should be handled the same way they are handled in rbind.data.table or rbindlist). An easy fix to your original expression is to convert to character:

data[, bucket := as.character(cut(DATE, breaks[B == A, BREAKPOINT], ordered_result = T))
     , by = A]
#   A       DATE     bucket
#1: 1 2012-01-01         NA
#2: 1 2015-05-30 2015-01-01
#3: 1 2020-01-01         NA
#4: 2 2012-06-30 2012-06-30
#5: 2 2013-06-30         NA
#6: 2 1999-01-01         NA
like image 160
eddi Avatar answered Oct 18 '22 20:10

eddi