Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using multiple criteria in subset function and logical operators

If I want to select a subset of data in R, I can use the subset function. I wanted to base an analysis on data that that was matching one of a few criteria, e.g. that a certain variable was either 1, 2 or 3. I tried

myNewDataFrame <- subset(bigfive, subset = (bigfive$bf11==(1||2||3)))

It did always just select values that matched the first of the criteria, here 1. My assumption was that it would start with 1 and if it does evaluate to "false" it would go on to 2 and than to 3, and if none matches the statement after == is "false" and if one of them matches, it is "true".

I got the right result using

 newDataFrame <- subset(bigfive, subset = (bigfive$bf11==c(1,2,3)))

But I would like to be able to select data via logical operators, so: why did the first approach not work?

like image 548
JanD Avatar asked Apr 26 '11 17:04

JanD


People also ask

How do you add multiple conditions in R?

Multiple conditions can also be combined using which() method in R. The which() function in R returns the position of the value which satisfies the given condition. The %in% operator is used to check a value in the vector specified.

Can you subset a subset in R?

Subsetting both rows and columnsIt is possible to subset both rows and columns using the subset function. The select argument lets you subset variables (columns).

What does && mean in R?

& and && indicate logical AND and | and || indicate logical OR. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined.

How do you give conditions in R?

We can do this by using the if statement. We first assign the variable x , and then write the if condition. In this case, assign -3 to x , and set the if condition to be true if x is smaller than 0 ( x < 0 ). If we run the example code, we indeed see that the string “x is a negative number” gets printed out.


2 Answers

The correct operator is %in% here. Here is an example with dummy data:

set.seed(1)
dat <- data.frame(bf11 = sample(4, 10, replace = TRUE),
                  foo = runif(10))

giving:

> head(dat)
  bf11       foo
1    2 0.2059746
2    2 0.1765568
3    3 0.6870228
4    4 0.3841037
5    1 0.7698414
6    4 0.4976992

The subset of dat where bf11 equals any of the set 1,2,3 is taken as follows using %in%:

> subset(dat, subset = bf11 %in% c(1,2,3))
   bf11       foo
1     2 0.2059746
2     2 0.1765568
3     3 0.6870228
5     1 0.7698414
8     3 0.9919061
9     3 0.3800352
10    1 0.7774452

As to why your original didn't work, break it down to see the problem. Look at what 1||2||3 evaluates to:

> 1 || 2 || 3
[1] TRUE

and you'd get the same using | instead. As a result, the subset() call would only return rows where bf11 was TRUE (or something that evaluated to TRUE).

What you could have written would have been something like:

subset(dat, subset = bf11 == 1 | bf11 == 2 | bf11 == 3)

Which gives the same result as my earlier subset() call. The point is that you need a series of single comparisons, not a comparison of a series of options. But as you can see, %in% is far more useful and less verbose in such circumstances. Notice also that I have to use | as I want to compare each element of bf11 against 1, 2, and 3, in turn. Compare:

> with(dat, bf11 == 1 || bf11 == 2)
[1] TRUE
> with(dat, bf11 == 1 | bf11 == 2)
 [1]  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
like image 152
Gavin Simpson Avatar answered Oct 19 '22 09:10

Gavin Simpson


For your example, I believe the following should work:

myNewDataFrame <- subset(bigfive, subset = bf11 == 1 | bf11 == 2 | bf11 == 3)

See the examples in ?subset for more. Just to demonstrate, a more complicated logical subset would be:

data(airquality)
dat <- subset(airquality, subset = (Temp > 80 & Month > 5) | Ozone < 40)

And as Chase points out, %in% would be more efficient in your example:

myNewDataFrame <- subset(bigfive, subset = bf11 %in% c(1, 2, 3))

As Chase also points out, make sure you understand the difference between | and ||. To see help pages for operators, use ?'||', where the operator is quoted.

like image 32
jthetzel Avatar answered Oct 19 '22 11:10

jthetzel