Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

filter by using %like% between two columns of the data table

Tags:

r

data.table

Hello stackoverflowers,

I wonder if I could use the %like% operator row-wise in the datatable between two columns of the same datatable.

The following reproducible example will make it more clear.

First prepare the data

library(data.table)

iris <- as.data.table(iris)
iris <- iris[seq.int(from = 1, to = 150,length.out = 5)]
iris[, Species2 := c('set', "set|vers", "setosa", "nothing" , "virginica")]

Hence the dataset looks as follows.

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species  Species2
1:          5.1         3.5          1.4         0.2     setosa       set
2:          4.9         3.6          1.4         0.1     setosa  set|vers
3:          6.4         2.9          4.3         1.3 versicolor    setosa
4:          6.4         2.7          5.3         1.9  virginica   nothing
5:          5.9         3.0          5.1         1.8  virginica virginica

I would like to use something like the following command row-wise.

iris[Species%like%Species2]

but it does not understand that I want it row-wise. Is that possible? The result should be the 1,2,5 rows.

like image 414
George Sotiropoulos Avatar asked Dec 07 '25 09:12

George Sotiropoulos


2 Answers

One way would be to group by row:

iris[, .SD[Species %like% Species2], by = 1:5]
#   : Sepal.Length Sepal.Width Petal.Length Petal.Width   Species  Species2
#1: 1          5.1         3.5          1.4         0.2    setosa       set
#2: 2          4.9         3.6          1.4         0.1    setosa  set|vers
#3: 5          5.9         3.0          5.1         1.8 virginica virginica

Or as per @docendodiscimus 's comment, in case there are duplicate entries, you can do:

iris[, .SD[Species[1L] %like% Species2[1L]], by = .(Species, Species2)]
like image 101
LyzandeR Avatar answered Dec 08 '25 21:12

LyzandeR


%like% is just a wrapper around grepl, so the pattern (right-hand side) can only be length 1. You should be seeing a warning about this.

The stringi package lets you vectorize the pattern argument.

library(stringi)

iris[stri_detect_regex(Species, Species2)]

If you like the operator style instead of the function, you can make your own:

`%vlike%` <- function(x, y) {
  stri_detect_regex(x, y)
}

iris[Species %vlike% Species2]
#    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species  Species2
# 1:          5.1         3.5          1.4         0.2    setosa       set
# 2:          4.9         3.6          1.4         0.1    setosa  set|vers
# 3:          5.9         3.0          5.1         1.8 virginica virginica
like image 32
Nathan Werth Avatar answered Dec 08 '25 21:12

Nathan Werth