Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouped mean of difftime fails in data.table

Preface:

I have a column in a data.table of difftime values with units set to days. I am trying to create another data.table summarizing the values with

dt2 <- dt[, .(AvgTime = mean(DiffTime)), by = Group]

When printing the new data.table, I see values such as

1.925988e+00 days
1.143287e+00 days
1.453975e+01 days

I would like to limit the decimal place values for this column only (i.e. not setting options() unless I can do this specifically for difftime values this way). When I try to do this using the method above, modified, e.g.

dt2 <- dt[, .(AvgTime = round(mean(DiffTime)), 2), by = Group]

I am left with NA values, with both the base round() and format() functions returning the warning:

In mean(DiffTime) : argument is not numeric or logical.

Oddly enough, if I perform the same operation on a numeric field, this runs with no problems. Also, if I run the two separate lines of code, I can accomplish what I am looking to do:

dt2 <- dt[, .(AvgTime = mean(DiffTime)), by = Group]
dt2[, AvgTime := round(AvgTime, 2)]

Reproducible Example:

library(data.table)
set.seed(1)
dt <- data.table(
  Date1 = 
    sample(seq(as.Date('2017/10/01'), 
               as.Date('2017/10/31'), 
               by="days"), 24, replace = FALSE) +
    abs(rnorm(24)) / 10,
  Date2 = 
    sample(seq(as.Date('2017/10/01'), 
               as.Date('2017/10/31'), 
               by="days"), 24, replace = FALSE) +
    abs(rnorm(24)) / 10,
  Num1 =
    abs(rnorm(24)) * 10,
  Group = 
    rep(LETTERS[1:4], each=6)
)
dt[, DiffTime := abs(difftime(Date1, Date2, units = 'days'))]

# Warnings/NA:
class(dt$DiffTime) # "difftime"
dt2 <- dt[, .(AvgTime = round(mean(DiffTime), 2)), by = .(Group)]

# Works when numeric/not difftime:
class(dt$Num1) # "numeric"
dt2 <- dt[, .(AvgNum = round(mean(Num1), 2)), by = .(Group)]

# Works, but takes an additional step:
dt2<-dt[,.(AvgTime = mean(DiffTime)), by = .(Group)]
dt2[,AvgTime := round(AvgTime,2)]

# Works with base::mean:
class(dt$DiffTime) # "difftime"
dt2 <- dt[, .(AvgTime = round(base::mean(DiffTime), 2)), by = .(Group)]

Question:

Why am I not able to complete this conversion (rounding of the mean) in one step when the class is difftime? Am I missing something in my execution? Is this some sort of bug in data.table where it can't properly handle the difftime?

Issue added on github.

Update: Issue appears to be cleared after updating from data.table version 1.10.4 to 1.12.8.

like image 542
Gaffi Avatar asked Nov 13 '17 18:11

Gaffi


3 Answers

This was fixed by update #3567 on 2019/05/15, data.table version 1.12.4 released 2019/10/03

like image 178
Gaffi Avatar answered Oct 15 '22 20:10

Gaffi


This might be a little late but if you really want it to work you can do:

as.numeric(round(as.difftime(difftime(DATE1, DATE2)), 0))
like image 1
Alexis Drakopoulos Avatar answered Oct 15 '22 22:10

Alexis Drakopoulos


I recently ran into the same problem using data.table_1.11.8. One quick work around is to use base::mean instead of mean.

like image 1
Matt Motoki Avatar answered Oct 15 '22 22:10

Matt Motoki