Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a data.table with the by-operator: functions that return numeric values and/or NAs fail

Tags:

r

data.table

I have a data.table with two columns: one ID column and one value column. I want to split up the table by the ID column and run a function foo on the value column. This works fine as long as foo does not return NAs. In that case, I get an error that tells me that the types of the groups are not consistent. My assumption is that - since is.logical(NA) equals TRUE and is.numeric(NA) equals FALSE, data.table internally assumes that I want to combine logical values with numeric ones and returns an error. However, I find this behavior peculiar. Any comments on that? Do I miss something obvious here or is that indeed intended behavior? If so, a short explanation would be great. (Notice that I do know a work-around: just let foo2 return a complete improbable number and filter for that later. However, this seems bad coding).

Here is the example:

library(data.table)
foo1 <- function(x) {if (mean(x) < 5) {return(1)} else {return(2)}}
foo2 <- function(x) {if (mean(x) < 5) {return(1)} else {return(NA)}}
DT <- data.table(ID=rep(c("A", "B"), each=5), value=1:10)
DT[, foo1(value), by=ID] #Works perfectly
     ID V1
[1,]  A  1
[2,]  B  2
DT[, foo2(value), by=ID] #Throws error
Error in `[.data.table`(DT, , foo2(value), by = ID) : 
columns of j don't evaluate to consistent types for each group: result for group 2 has column 1 type 'logical' but expecting type 'numeric'
like image 974
Christoph_J Avatar asked Oct 31 '11 23:10

Christoph_J


People also ask

What does setDT do in R?

setDT converts lists (both named and unnamed) and data. frames to data. tables by reference. This feature was requested on Stackoverflow.

Is NA function in R?

In R, missing values are represented by the symbol NA (not available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number). Unlike SAS, R uses the same symbol for character and numeric data.

What is the table function in R?

Table function (table())in R performs a tabulation of categorical variable and gives its frequency as output. It is further useful to create conditional frequency table and Proportinal frequency table. This recipe demonstrates how to use table() function to create the following two tables: Frequency table.

How do you mutate a table in R?

To use mutate() , pass it a series of names followed by R expressions. mutate() will return a copy of your table that contains one column for each name that you pass to mutate() .


1 Answers

You can fix this by specifying that your function should return an NA_real_, rather than an NA of the default type.

foo2 <- function(x) {if (mean(x) < 5) {return(1)} else {return(NA)}}
DT[, foo2(value), by=ID] #Throws error
# Error in `[.data.table`(DT, , foo2(value), by = ID) : 
# columns of j don't evaluate to consistent types for each group: 
# result for group 2 has column 1 type 'logical' but expecting type 'numeric'

foo3 <- function(x) {if (mean(x) < 5) {return(1)} else {return(NA_real_)}}
DT[, foo3(value), by=ID] #Works
#      ID V1
# [1,]  A  1
# [2,]  B NA

Incidentally the message that foo2() gives when it fails is nicely informative. It essentially tells you that your NA is of the wrong type. To fix the problem, you just need to look for the NA constant of the right type (or class):

NAs <- list(NA, NA_integer_, NA_real_, NA_character_, NA_complex_)
data.frame(contantName = sapply(NAs, deparse), 
           class       = sapply(NAs, class),
           type        = sapply(NAs, typeof))

#     contantName     class      type
# 1            NA   logical   logical
# 2   NA_integer_   integer   integer
# 3      NA_real_   numeric    double
# 4 NA_character_ character character
# 5   NA_complex_   complex   complex
like image 150
Josh O'Brien Avatar answered Oct 18 '22 07:10

Josh O'Brien