Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter/subset a data.frame using values from one of its column [duplicate]

Tags:

dataframe

r

How can I "truncate" a data.frame based on the values in a single column? For example, if I have this matrix

x <- c(5,1,3,2,4)
y <- c(1,5,3,4,2)
data <- data.frame(x,y)

and I want all data for values greater than or equal to x, how would I do that? I know that I can find the addresses of x-values using

addresses <- which(x>=2)

but I'm not sure how to use this to make a new matrix. The following do not work:

data2 <- data[x>=2]
data2 <- data[which(x>=2)]

If anyone can offer any advice, I'd really appreciate it.

like image 231
Thomas Avatar asked Mar 04 '13 16:03

Thomas


1 Answers

You're not reading the error messages closely enough. Here, our error message tells you that you have not selected any columns. You've specified the condition for the rows though....

> data[which(x>=2)]
Error in `[.data.frame`(data, which(x >= 2)) : undefined columns selected

Since you want to return all columns, just put a comma in (indicating that you want all columns returned), and you should be all set.

> data[which(x>=2), ] # if x is in your workspace
  x y
1 5 1
3 3 3
4 2 4
5 4 2
> ## with(data, data[x >= 2, ] # if x is not in your workspace

Here's another point to note: You can make your data.frame directly like this:

data <- data.frame(x = c(5,1,3,2,4), y = c(1,5,3,4,2))

Here's why I suggest this. First, there are no unnecessary objects in your workspace. Second, you aren't fooled into thinking something is working when it isn't. You wrote that: "I know that I can find the addresses of x-values using addresses <- which(x>=2)". True, but what you perhaps didn't realize (hence this question) is that you aren't actually accessing the "x" from your data.frame but the "x" vector in your workspace.

like image 65
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 01 '22 15:10

A5C1D2H2I1M1N2O1R2T1