How to subset an R data frame with condition based on only one value from categorical column? First of all, create a data frame. Then, subset the data frame with condition using filter function of dplyr package.
Multiple conditions can also be combined using which() method in R. The which() function in R returns the position of the value which satisfies the given condition. The %in% operator is used to check a value in the vector specified.
Subsetting both rows and columnsIt is possible to subset both rows and columns using the subset function. The select argument lets you subset variables (columns).
To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.
my.data.frame <- subset(data , V1 > 2 | V2 < 4)
An alternative solution that mimics the behavior of this function and would be more appropriate for inclusion within a function body:
new.data <- data[ which( data$V1 > 2 | data$V2 < 4) , ]
Some people criticize the use of which
as not needed, but it does prevent the NA
values from throwing back unwanted results. The equivalent (.i.e not returning NA-rows for any NA's in V1 or V2) to the two options demonstrated above without the which
would be:
new.data <- data[ !is.na(data$V1 | data$V2) & ( data$V1 > 2 | data$V2 < 4) , ]
Note: I want to thank the anonymous contributor that attempted to fix the error in the code immediately above, a fix that got rejected by the moderators. There was actually an additional error that I noticed when I was correcting the first one. The conditional clause that checks for NA values needs to be first if it is to be handled as I intended, since ...
> NA & 1
[1] NA
> 0 & NA
[1] FALSE
Order of arguments may matter when using '&".
You are looking for "|." See http://cran.r-project.org/doc/manuals/R-intro.html#Logical-vectors
my.data.frame <- data[(data$V1 > 2) | (data$V2 < 4), ]
Just for the sake of completeness, we can use the operators [
and [[
:
set.seed(1)
df <- data.frame(v1 = runif(10), v2 = letters[1:10])
Several options
df[df[1] < 0.5 | df[2] == "g", ]
df[df[[1]] < 0.5 | df[[2]] == "g", ]
df[df["v1"] < 0.5 | df["v2"] == "g", ]
df$name is equivalent to df[["name", exact = FALSE]]
Using dplyr
:
library(dplyr)
filter(df, v1 < 0.5 | v2 == "g")
Using sqldf
:
library(sqldf)
sqldf('SELECT *
FROM df
WHERE v1 < 0.5 OR v2 = "g"')
Output for the above options:
v1 v2
1 0.26550866 a
2 0.37212390 b
3 0.20168193 e
4 0.94467527 g
5 0.06178627 j
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With