Why is "grep" causing problems in the below data.table calls.
set.seed(45)
dt <- data.table(
col1 = sample(letters[1:2],10, replace=TRUE),
col2=sample(letters[1:5], 10, replace=TRUE),
col3=runif(10,1,5))
Subsetting like this, works:
dt[col1=="b" & col2=="b",] # Works
col1 col2 col3
1: b b 1.5166
But this throws a warning and returns wrong data (or no warning and wrong data)
dt[grep("b", col1) & col2=="b",] # does not
# with seed = 42
> Warning message: In grep("b", col1) & col2 == "b" : longer object
> length is not a multiple of shorter object length
# with seed = 45
col1 col2 col3
1: b b 1.516600
2: a b 3.342007
3: a b 1.865772
I can avoid this confusion by tying the subsets together:
dt[grep("b", col1),][col2=="b",]
But that is not very elegant.
ps. I guess the problem is different than here
The output of grep
is a numeric vector
. It can be of length
anywhere between 0 to the length of the original vector
depending on how many matches are there. But, if we use grepl
, the return vector
is logical
and it will always be of the same length
as the original vector. If there are no matches, only difference is that it will be all FALSE
. In that respect, the below code should work fine.
dt[grepl("b", col1) & col2=="b"]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With