Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to filter data frame with conditions of two columns? [duplicate]

Tags:

dataframe

r

I am trying to select from a data frame. The question is why I the last query below returns all 5 records not jsut the first two?

> x <- c(5,1,3,2,4)
> y <- c(1,5,3,4,2)
> data <- data.frame(x,y)
> data
  x y
1 5 1
2 1 5
3 3 3
4 2 4
5 4 2
> data[data$x > 4 || data$y > 4]
  x y
1 5 1
2 1 5
3 3 3
4 2 4
5 4 2
like image 991
fatdragon Avatar asked Nov 19 '13 23:11

fatdragon


People also ask

How do you filter a DataFrame in multiple conditions?

Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.


2 Answers

(1) For select data (subset), I highly recommend subset function from plyr package written by Hadley Wickhm, it is cleaner and easy to use:

library(plyr)
subset(data, x > 4 | y > 4)

UPDATE:

There is a newer version of plyr called dplyr (here) which is also from Hadley, but supposedly way faster and easier to use. If you have ever seen operatior like %.% or %>%, you know they are chaining the operations using dplyr.

result <- data %>%
          filter(x>4 | y>4)  #NOTE filter(condition1, condition2..) for AND operators.

(2) There indeed exist some differences between | and ||:

You can look at the help manual by doing this: ?'|'

The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined. The longer form is appropriate for programming control-flow and typically preferred in if clauses.

> c(1,1,0) | c(0,0,0)
[1]  TRUE  TRUE FALSE
> c(1,1,0) || c(0,0,0)
[1] TRUE

Per your question, what you did is basically data[TRUE], which ...will return the complete dataframe.

like image 199
B.Mr.W. Avatar answered Oct 12 '22 16:10

B.Mr.W.


Here's something that works for me.

data[data[,1] > 4 | data[,2] > 4,1:2]

I'm not sure exactly why your method isn't working but I think it is because you're not telling it when not to print. Look at help("[").

like image 20
CCurtis Avatar answered Oct 12 '22 16:10

CCurtis