Using multiple criteria in subset function and logical operators

Tags:

If I want to select a subset of data in R, I can use the subset function. I wanted to base an analysis on data that that was matching one of a few criteria, e.g. that a certain variable was either 1, 2 or 3. I tried

myNewDataFrame <- subset(bigfive, subset = (bigfive$bf11==(1||2||3)))

It did always just select values that matched the first of the criteria, here 1. My assumption was that it would start with 1 and if it does evaluate to "false" it would go on to 2 and than to 3, and if none matches the statement after == is "false" and if one of them matches, it is "true".

I got the right result using

 newDataFrame <- subset(bigfive, subset = (bigfive$bf11==c(1,2,3)))

But I would like to be able to select data via logical operators, so: why did the first approach not work?

548

asked Apr 26 '11 17:04

JanD

2 Answers

The correct operator is %in% here. Here is an example with dummy data:

set.seed(1)
dat <- data.frame(bf11 = sample(4, 10, replace = TRUE),
                  foo = runif(10))

giving:

> head(dat)
  bf11       foo
1    2 0.2059746
2    2 0.1765568
3    3 0.6870228
4    4 0.3841037
5    1 0.7698414
6    4 0.4976992

The subset of dat where bf11 equals any of the set 1,2,3 is taken as follows using %in%:

> subset(dat, subset = bf11 %in% c(1,2,3))
   bf11       foo
1     2 0.2059746
2     2 0.1765568
3     3 0.6870228
5     1 0.7698414
8     3 0.9919061
9     3 0.3800352
10    1 0.7774452

As to why your original didn't work, break it down to see the problem. Look at what 1||2||3 evaluates to:

> 1 || 2 || 3
[1] TRUE

and you'd get the same using | instead. As a result, the subset() call would only return rows where bf11 was TRUE (or something that evaluated to TRUE).

What you could have written would have been something like:

subset(dat, subset = bf11 == 1 | bf11 == 2 | bf11 == 3)

Which gives the same result as my earlier subset() call. The point is that you need a series of single comparisons, not a comparison of a series of options. But as you can see, %in% is far more useful and less verbose in such circumstances. Notice also that I have to use | as I want to compare each element of bf11 against 1, 2, and 3, in turn. Compare:

> with(dat, bf11 == 1 || bf11 == 2)
[1] TRUE
> with(dat, bf11 == 1 | bf11 == 2)
 [1]  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE

152

answered Oct 19 '22 09:10

Gavin Simpson

For your example, I believe the following should work:

myNewDataFrame <- subset(bigfive, subset = bf11 == 1 | bf11 == 2 | bf11 == 3)

See the examples in ?subset for more. Just to demonstrate, a more complicated logical subset would be:

data(airquality)
dat <- subset(airquality, subset = (Temp > 80 & Month > 5) | Ozone < 40)

And as Chase points out, %in% would be more efficient in your example:

myNewDataFrame <- subset(bigfive, subset = bf11 %in% c(1, 2, 3))

As Chase also points out, make sure you understand the difference between | and ||. To see help pages for operators, use ?'||', where the operator is quoted.

answered Oct 19 '22 11:10

jthetzel

Related questions
                            
                                spplot() - make color.key look nice
                            
                                How to plot two lines in ggplot2
                            
                                Changing the Sweave driver from the command line
                            
                                accessing Facebook API from R for Text Mining
                            
                                (console) user interaction in R?
                            
                                define class methods and class variables in R5 reference class
                            
                                How to extract the pixel data Use R's pixmap package?
                            
                                How to page multiple plots in R in separate jpeg files?
                            
                                Using the R.NET assembly in IronPython
                            
                                Predicting/imputing the missing values of a Poisson GLM Regression in R?
                            
                                How do I add citations and a bibliography to "Rpres" rmarkdown presentations?
                            
                                Is it possible to use non-imported packages in a package vignette?
                            
                                How to insert a dataframe into a SQL Server table?
                            
                                Require minimum version of R package
                            
                                Change letter case of column names
                            
                                Why is my recursive function so slow in R?
                            
                                Is my way of duplicating rows in data.table efficient?
                            
                                Finding 2 & 3 word Phrases Using R TM Package
                            
                                Matching timestamped data to closest time in another dataset. Properly vectorized? Faster way?
                            
                                RSelenium: server signals port is already in use

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using multiple criteria in subset function and logical operators

Tags:

r

logical-operators

operator-precedence

subset

JanD

People also ask

2 Answers

Gavin Simpson

jthetzel

Recent Activity

Donate For Us