The function max()
operates correctly on column of type ordered factor. However, the same operation fails when the column is grouped with by=
.
Let's say I have a data.table as:
DT <- data.table(ID=rep(1:3, 3), State=sample(LETTERS[1:3], 9, replace=TRUE))
Convert the column State
to ordered factor as:
DT[, State := factor(State, levels=LETTERS[1:3], ordered = TRUE)]
This works:
DT[, max(State)]
This fails with error:
DT[, max(State), by="ID"]
Error is: Error in gmax(State) : max is not meaningful for factors.
How come?
This was a bug that has been fixed in the current development version of data.table
.
You can install the development version via:
install.packages('data.table', type = 'source',
repos = 'http://Rdatatable.github.io/data.table')
If this fails, check full details on the Installation wiki.
library(data.table)
# data.table 1.11.5 IN DEVELOPMENT built 2018-08-13 20:20:11 UTC; travis Latest news: r-datatable.com
DT[ , max(State), by="ID"]
# ID V1
# 1: 1 C
# 2: 2 C
# 3: 3 B
For those in controlled/production environments unable to update, you can still sidestep the problem by running:
dt_optim = options(datatable.optimize = 0)
DT[ , max(State), by="ID"]
# resetting afterwards to keep your code running as fast as possible
options(datatable.optimize = dt_optim)
The bug came from data.table
's internally optimized grouping framework GForce
; the above workaround stops this code from executing and defaults to base::max
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With