Splitting a data.table with the by-operator: functions that return numeric values and/or NAs fail

Tags:

I have a data.table with two columns: one ID column and one value column. I want to split up the table by the ID column and run a function foo on the value column. This works fine as long as foo does not return NAs. In that case, I get an error that tells me that the types of the groups are not consistent. My assumption is that - since is.logical(NA) equals TRUE and is.numeric(NA) equals FALSE, data.table internally assumes that I want to combine logical values with numeric ones and returns an error. However, I find this behavior peculiar. Any comments on that? Do I miss something obvious here or is that indeed intended behavior? If so, a short explanation would be great. (Notice that I do know a work-around: just let foo2 return a complete improbable number and filter for that later. However, this seems bad coding).

Here is the example:

library(data.table)
foo1 <- function(x) {if (mean(x) < 5) {return(1)} else {return(2)}}
foo2 <- function(x) {if (mean(x) < 5) {return(1)} else {return(NA)}}
DT <- data.table(ID=rep(c("A", "B"), each=5), value=1:10)
DT[, foo1(value), by=ID] #Works perfectly
     ID V1
[1,]  A  1
[2,]  B  2
DT[, foo2(value), by=ID] #Throws error
Error in `[.data.table`(DT, , foo2(value), by = ID) : 
columns of j don't evaluate to consistent types for each group: result for group 2 has column 1 type 'logical' but expecting type 'numeric'

974

asked Oct 31 '11 23:10

Christoph_J

1 Answers

You can fix this by specifying that your function should return an NA_real_, rather than an NA of the default type.

foo2 <- function(x) {if (mean(x) < 5) {return(1)} else {return(NA)}}
DT[, foo2(value), by=ID] #Throws error
# Error in `[.data.table`(DT, , foo2(value), by = ID) : 
# columns of j don't evaluate to consistent types for each group: 
# result for group 2 has column 1 type 'logical' but expecting type 'numeric'

foo3 <- function(x) {if (mean(x) < 5) {return(1)} else {return(NA_real_)}}
DT[, foo3(value), by=ID] #Works
#      ID V1
# [1,]  A  1
# [2,]  B NA

Incidentally the message that foo2() gives when it fails is nicely informative. It essentially tells you that your NA is of the wrong type. To fix the problem, you just need to look for the NA constant of the right type (or class):

NAs <- list(NA, NA_integer_, NA_real_, NA_character_, NA_complex_)
data.frame(contantName = sapply(NAs, deparse), 
           class       = sapply(NAs, class),
           type        = sapply(NAs, typeof))

#     contantName     class      type
# 1            NA   logical   logical
# 2   NA_integer_   integer   integer
# 3      NA_real_   numeric    double
# 4 NA_character_ character character
# 5   NA_complex_   complex   complex

150

answered Oct 18 '22 07:10

Josh O'Brien

Related questions
                            
                                Shiny: passing input$var to aes() in ggplot2
                            
                                R: apply a function to every element of two variables respectively
                            
                                R: Plot trees from h2o.randomForest() and h2o.gbm()
                            
                                Survival on binned data
                            
                                Set opacity of background map with ggmap
                            
                                How to rotate 180 degrees an mtext() in R
                            
                                RMarkdown button to show or hide code
                            
                                What are levels in R?
                            
                                How to load xlsx file using fread function?
                            
                                How to export tibble to .csv
                            
                                How to fix "Could not find shape for Powerpoint content Error: pandoc document conversion failed with error 63" Error
                            
                                Find array index of elements of a matrix that match a value in a vector of candidate values
                            
                                Mutating multiple columns dynamically while conditioning on specific rows
                            
                                Benchmarking "sample" function in R
                            
                                How to put axes behind the graph?
                            
                                why does as.vector deep copy a matrix?
                            
                                Use Windows (TTF) font?
                            
                                How to manually add a legend to a ggplot object
                            
                                JIT of R code using Ra
                            
                                Why does `:=` work as an infix operator?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Splitting a data.table with the by-operator: functions that return numeric values and/or NAs fail

Tags:

r

data.table

Christoph_J

People also ask

1 Answers

Josh O'Brien

Recent Activity

Donate For Us