I have a data.table
with two columns: one ID
column and one value
column. I want to split up the table by the ID
column and run a function foo
on the value
column. This works fine as long as foo
does not return NAs. In that case, I get an error that tells me that the types of the groups are not consistent. My assumption is that - since is.logical(NA)
equals TRUE
and is.numeric(NA)
equals FALSE
, data.table
internally assumes that I want to combine logical values with numeric ones and returns an error. However, I find this behavior peculiar. Any comments on that? Do I miss something obvious here or is that indeed intended behavior? If so, a short explanation would be great. (Notice that I do know a work-around: just let foo2
return a complete improbable number and filter for that later. However, this seems bad coding).
Here is the example:
library(data.table)
foo1 <- function(x) {if (mean(x) < 5) {return(1)} else {return(2)}}
foo2 <- function(x) {if (mean(x) < 5) {return(1)} else {return(NA)}}
DT <- data.table(ID=rep(c("A", "B"), each=5), value=1:10)
DT[, foo1(value), by=ID] #Works perfectly
ID V1
[1,] A 1
[2,] B 2
DT[, foo2(value), by=ID] #Throws error
Error in `[.data.table`(DT, , foo2(value), by = ID) :
columns of j don't evaluate to consistent types for each group: result for group 2 has column 1 type 'logical' but expecting type 'numeric'
setDT converts lists (both named and unnamed) and data. frames to data. tables by reference. This feature was requested on Stackoverflow.
In R, missing values are represented by the symbol NA (not available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number). Unlike SAS, R uses the same symbol for character and numeric data.
Table function (table())in R performs a tabulation of categorical variable and gives its frequency as output. It is further useful to create conditional frequency table and Proportinal frequency table. This recipe demonstrates how to use table() function to create the following two tables: Frequency table.
To use mutate() , pass it a series of names followed by R expressions. mutate() will return a copy of your table that contains one column for each name that you pass to mutate() .
You can fix this by specifying that your function should return an NA_real_
, rather than an NA
of the default type.
foo2 <- function(x) {if (mean(x) < 5) {return(1)} else {return(NA)}}
DT[, foo2(value), by=ID] #Throws error
# Error in `[.data.table`(DT, , foo2(value), by = ID) :
# columns of j don't evaluate to consistent types for each group:
# result for group 2 has column 1 type 'logical' but expecting type 'numeric'
foo3 <- function(x) {if (mean(x) < 5) {return(1)} else {return(NA_real_)}}
DT[, foo3(value), by=ID] #Works
# ID V1
# [1,] A 1
# [2,] B NA
Incidentally the message that foo2()
gives when it fails is nicely informative. It essentially tells you that your NA is of the wrong type. To fix the problem, you just need to look for the NA
constant of the right type (or class):
NAs <- list(NA, NA_integer_, NA_real_, NA_character_, NA_complex_)
data.frame(contantName = sapply(NAs, deparse),
class = sapply(NAs, class),
type = sapply(NAs, typeof))
# contantName class type
# 1 NA logical logical
# 2 NA_integer_ integer integer
# 3 NA_real_ numeric double
# 4 NA_character_ character character
# 5 NA_complex_ complex complex
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With