How come I can filter a factor variable using a double variable in one case, but not in another?
Example data below:
dt <- data.table(id=1:9,
var=factor(81:89))
# > dt
# id var
# 1: 1 81
# 2: 2 82
# 3: 3 83
# 4: 4 84
# 5: 5 85
# 6: 6 86
# 7: 7 87
# 8: 8 88
# 9: 9 89
Why does this work...
dt[id %in% 1:7 & var %in% c(82, 84)]
# id var
# 1: 2 82
# 2: 4 84
...but this gives an error?
dt[var %in% c(82, 84)]
# Error in bmerge(i, x, leftcols, rightcols, io <- FALSE, xo, roll = 0, :
# x.'var' is a factor column being joined to i.'V1' which is type 'double'.
# Factor columns must join to factor or character columns.`
Seems a bit inconsequent and might be a bug?
In this, first, pass your dataframe object to the filter function, then in the condition parameter write the column name in which you want to filter multiple values then put the %in% operator, and then pass a vector containing all the string values which you want in the result.
Use inbuilt data sets or create a new data set and look at top few rows in the data set. Then, look at the bottom few rows in the data set. Check the data structure. Filter the data by categorical column using split function.
The difference is that the second example is optimized by automatic indexing, which throws this error. You can switch off this feature like this:
dt[(var %in% c(82, 84))]
# id var
#1: 2 82
#2: 4 84
Then a base R vector scan is used and usual coercion rules apply. From help("%in%")
:
Factors, raw vectors and lists are converted to character vectors, and then x and table are coerced to a common type
var <- factor(81:89)
var %in% c(82, 84)
#[1] FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
The problem has been fixed in data.table version 1.9.7.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With