Can someone please answer in layman terms how indexing (subsetting) with NA
works. Even though there are some answers from google, I would like to understand it better in simple terms.
When indexing a vector (of length > 1) using a single NA
, why does it yield five missing values?
> x <- 1:5
> x[NA]
[1] NA NA NA NA NA
There are two easy methods to select columns of an R data frame without missing values, first one results in a vector and other returns a matrix. For example, if we have a data frame called df then the first method can be used as df[,colSums(is.na(df))==0] and the second method will be used as t(na. omit(t(df))).
Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. You can use brackets to select rows and columns from your dataframe.
There are three subsetting operators, [[ , [ , and $ . Subsetting operators interact differently with different vector types (e.g., atomic vectors, lists, factors, matrices, and data frames). Subsetting can be combined with assignment.
The way you tell R that you want to select some particular elements (i.e., a 'subset') from a vector is by placing an 'index vector' in square brackets immediately following the name of the vector. For a simple example, try x[1:10] to view the first ten elements of x.
From help("[")
:
When extracting, a numerical, logical or character NA index picks an unknown element and so returns NA in the corresponding element of a logical, integer, numeric, complex or character result, and NULL for a list.
What does "corresponding element" mean? This can be understood if you know about recycling of vector elements. x[NA]
(this is a logical NA
per default) in your example is actually "interpreted" as x[c(NA, NA, NA, NA, NA)]
since logical indices are recycled. So, each element of x
has a corresponding NA
during subsetting and thus (per the quote above) NA
is returned for each element of x
. In layman's language: For each element of x
we don't know if we want it. Thus an unknown value is returned for each element.
As @r2evans points out: x[NA_integer_]
returns only one NA
because integer indices are not recycled. In layman's language: We want one value but don't know which one. Thus, one unknown value is returned.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With