Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering on factor variable using double variable in R data.table

Tags:

r

data.table

How come I can filter a factor variable using a double variable in one case, but not in another?

Example data below:

dt <- data.table(id=1:9,
                 var=factor(81:89))

# > dt
#    id var
# 1:  1  81
# 2:  2  82
# 3:  3  83
# 4:  4  84
# 5:  5  85
# 6:  6  86
# 7:  7  87
# 8:  8  88
# 9:  9  89

Why does this work...

dt[id %in% 1:7 & var %in% c(82, 84)]

#    id var
# 1:  2  82
# 2:  4  84

...but this gives an error?

dt[var %in% c(82, 84)]

# Error in bmerge(i, x, leftcols, rightcols, io <- FALSE, xo, roll = 0,  : 
#  x.'var' is a factor column being joined to i.'V1' which is type 'double'.
# Factor columns must join to factor or character columns.`

Seems a bit inconsequent and might be a bug?

like image 487
Bram Visser Avatar asked Jul 29 '16 13:07

Bram Visser


People also ask

How do I filter more than one variable in R?

In this, first, pass your dataframe object to the filter function, then in the condition parameter write the column name in which you want to filter multiple values then put the %in% operator, and then pass a vector containing all the string values which you want in the result.

How do I filter categorical data in R?

Use inbuilt data sets or create a new data set and look at top few rows in the data set. Then, look at the bottom few rows in the data set. Check the data structure. Filter the data by categorical column using split function.


1 Answers

The difference is that the second example is optimized by automatic indexing, which throws this error. You can switch off this feature like this:

dt[(var %in% c(82, 84))]
#   id var
#1:  2  82
#2:  4  84

Then a base R vector scan is used and usual coercion rules apply. From help("%in%"):

Factors, raw vectors and lists are converted to character vectors, and then x and table are coerced to a common type

var <- factor(81:89)
var %in% c(82, 84)
#[1] FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE

The problem has been fixed in data.table version 1.9.7.

like image 180
Roland Avatar answered Oct 22 '22 12:10

Roland