Say I have a data frame df
and want to subset it based on the value of column a.
df <- data.frame(a = 1:4, b = 5:8)
df
Is it necessary to include a which
function in the brackets or can I just include the logical test?
df[df$a == "2",]
# a b
#2 2 6
df[which(df$a == "2"),]
# a b
#2 2 6
It seems to work the same either way... I was getting some strange results in a large data frame (i.e., getting empty rows returned as well as the correct ones) but once I cleaned the environment and reran my script it worked fine.
Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. You can use brackets to select rows and columns from your dataframe.
There are three subsetting operators, [[ , [ , and $ . Subsetting operators interact differently with different vector types (e.g., atomic vectors, lists, factors, matrices, and data frames). Subsetting can be combined with assignment.
df$a == "2"
returns a logical vector, while which(df$a=="2")
returns indices. If there are missing values in the vector, the first approach will include them in the returned value, but which
will exclude them.
For example:
x=c(1,NA,2,10)
x[x==2]
[1] NA 2
x[which(x==2)]
[1] 2
x==2
[1] FALSE NA TRUE FALSE
which(x==2)
[1] 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With