How do I select rows by two criteria in data.table in R

Tags:

Let's say I have a data.table and I want to select all the rows where the variable x has a value of b. That is easy

library(data.table) DT <- data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9) setkey(DT,x)               # set a 1-column key DT["b"]

By the way, it appears that one has to set a key, if the key is not set to x then this does not work. By the way what would happen if I set two columns as keys?

Anyway, moving along, lets say that I want to select all the rows where the variable x was a or b

DT["b"|"a"]

does not work

But the following works

DT[x=="a"|x=="b"]

But that uses vector scanning a la data frames. It does not use the binary search. I guess for smaller data sets it will not matter.

Is that what I should do or am I ignorant of data.table syntax?

And one more thing. Are there any examples of more complex Boolean multi-variable selection (or subset) procedures with data.table?

I know I could always revert to using the subset() function since a data.table will behave as a data.frame if it must.

969

asked Dec 14 '11 18:12

Farrel

2 Answers

Here is a way that only crossed my mind after I asked the question and it works but I do not know how it does in benchmarks. I am not currently at a computer with an installed R. I guess I should use a cloud instance. Anyway, I like the syntax

DT[c("a","b")]

100

answered Sep 23 '22 21:09

Farrel

Using the %in% operator seems to give a factor of 2 performance bump. Consider:

library(data.table) library(rbenchmark) DT <- data.table(x=sample(letters, 1e6, TRUE), y=rnorm(1e6), v=runif(1e6)) setkey(DT,x)               # set a 1-column key DT["b"] f1 <- function() DT[x %in% letters[1:2]] f2 <- function() DT[x=="a"| x == "b"]  > benchmark(f1(),f2())   test replications elapsed relative user.self sys.self user.child sys.child 1 f1()          100    8.40 1.000000      7.58     0.81         NA        NA 2 f2()          100   17.11 2.036905     15.54     1.56         NA        NA  > all.equal(f1(), f2()) [1] TRUE

EDIT: Adding Farrel's option

Note, this is on a different computer, but the relative bumps are the same.

f3 <- function() DT[c("a", "b")]    test replications elapsed  relative user.self sys.self user.child sys.child 1 f1()          100  11.281  7.121843     9.745    1.323          0         0 2 f2()          100  23.106 14.587121    20.824    2.224          0         0 3 f3()          100   1.584  1.000000     1.042    0.541          0         0

answered Sep 19 '22 21:09

Chase

Related questions
                            
                                R language aware code reformatting/refactoring tools?
                            
                                R Shiny/ Restful Api Communication
                            
                                Python equivalent of which() in R
                            
                                Why is match.call useful?
                            
                                How to change the size of R plots in Jupyter?
                            
                                caught segfault - 'memory not mapped' error in R
                            
                                How to stop R from creating empty Rplots.pdf file when using ggsave and Rscript
                            
                                Joining a dendrogram and a heatmap
                            
                                How to change positions of x and y axis in ggplot2
                            
                                How do I get my blogdown blog on R-Bloggers?
                            
                                How to draw a "zoom in" effect in R
                            
                                How do I use index number as x in ggplot? [closed]
                            
                                Installing R Packages Error in readRDS(file) : error reading from connection
                            
                                World map with ggmap
                            
                                how to convert data.frame to transactions for arules
                            
                                Generally disable dimension dropping for matrices?
                            
                                how to suppress output when using `:=` in R {data.table}, prior to v1.8.3?
                            
                                How do I make sure that a shiny reactive plot only changes once all other reactives finish changing?
                            
                                How do I add a Changelog or NEWS file to my R package?
                            
                                Can't draw Histogram, 'x' must be numeric

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I select rows by two criteria in data.table in R

Tags:

select

r

data.table

subset

Farrel

People also ask

2 Answers

Farrel

Chase

Recent Activity

Donate For Us