Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rounding milliseconds of POSIXct in data.table v1.9.2 (ok in 1.8.10)

I have a weird result for my data.table v1.9.2 :

DT
                 timestamp
1: 2013-01-01 17:51:00.707
2: 2013-01-01 17:51:59.996
3: 2013-01-01 17:52:00.059
4: 2013-01-01 17:54:23.901
5: 2013-01-01 17:54:23.914

str(DT)
Classes ‘data.table’ and 'data.frame':  5 obs. of  1 variable:
 $ timestamp: POSIXct, format: "2013-01-01 17:51:00.707" "2013-01-01 17:51:59.996" "2013-01-01 17:52:00.059" "2013-01-01 17:54:23.901" ...
 - attr(*, "sorted")= chr "timestamp"
 - attr(*, ".internal.selfref")=<externalptr> 

When I apply the duplicated() function I get the following result:

duplicated(DT)
[1] FALSE FALSE FALSE FALSE  TRUE

It is weird to get the 5th line equal to the 4th. This also blocks me from joining tables in R. Does is have something to do with POSIXct type?

DT on skydrive : DT

Thanks.

like image 311
misha_dodic Avatar asked Mar 12 '14 15:03

misha_dodic


1 Answers

Yes I reproduced your result with v1.9.2.

library(data.table)

DT <- data.table(timestamp=c(as.POSIXct("2013-01-01 17:51:00.707"),
                             as.POSIXct("2013-01-01 17:51:59.996"),
                             as.POSIXct("2013-01-01 17:52:00.059"),
                             as.POSIXct("2013-01-01 17:54:23.901"),
                             as.POSIXct("2013-01-01 17:54:23.914")))

options(digits.secs=3)  # usually placed in .Rprofile

DT
                 timestamp
1: 2013-01-01 17:51:00.707
2: 2013-01-01 17:51:59.996
3: 2013-01-01 17:52:00.059
4: 2013-01-01 17:54:23.901
5: 2013-01-01 17:54:23.914

duplicated(DT)
## [1] FALSE FALSE FALSE FALSE TRUE

Update from v1.9.3 from Matt

There was a change to rounding in v1.9.2 which affected milliseconds of POSIXct. More info here :

Grouping very small numbers (e.g. 1e-28) and 0.0 in data.table v1.8.10 vs v1.9.2

Large integers in data.table. Grouping results different in 1.9.2 compared to 1.8.10

So, the workaround now available in v1.9.3 is :

> setNumericRounding(1)   # default is 2
> duplicated(DT)
[1] FALSE FALSE FALSE FALSE FALSE

Hope you understand why the change was made and agree that we're going in the right direction.

Of course, you shouldn't have to call setNumericRounding(), that's just a workaround.

I've filed a new item on the tracker :

#5445 numeric rounding should be 0 or 1 automatically for POSIXct

like image 187
hrbrmstr Avatar answered Nov 12 '22 08:11

hrbrmstr