Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does subsetting with NA work?

Tags:

r

na

subset

Can someone please answer in layman terms how indexing (subsetting) with NA works. Even though there are some answers from google, I would like to understand it better in simple terms.

When indexing a vector (of length > 1) using a single NA, why does it yield five missing values?

> x <- 1:5
> x[NA]
[1] NA NA NA NA NA
like image 989
Naveen Gabriel Avatar asked Aug 28 '18 14:08

Naveen Gabriel


People also ask

How do I select columns with NA in R?

There are two easy methods to select columns of an R data frame without missing values, first one results in a vector and other returns a matrix. For example, if we have a data frame called df then the first method can be used as df[,colSums(is.na(df))==0] and the second method will be used as t(na. omit(t(df))).

What does subsetting mean in R?

Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. You can use brackets to select rows and columns from your dataframe.

What are the three subsetting operators in R?

There are three subsetting operators, [[ , [ , and $ . Subsetting operators interact differently with different vector types (e.g., atomic vectors, lists, factors, matrices, and data frames). Subsetting can be combined with assignment.

How do I select a specific value from a vector in R?

The way you tell R that you want to select some particular elements (i.e., a 'subset') from a vector is by placing an 'index vector' in square brackets immediately following the name of the vector. For a simple example, try x[1:10] to view the first ten elements of x.


1 Answers

From help("["):

When extracting, a numerical, logical or character NA index picks an unknown element and so returns NA in the corresponding element of a logical, integer, numeric, complex or character result, and NULL for a list.

What does "corresponding element" mean? This can be understood if you know about recycling of vector elements. x[NA] (this is a logical NA per default) in your example is actually "interpreted" as x[c(NA, NA, NA, NA, NA)] since logical indices are recycled. So, each element of x has a corresponding NA during subsetting and thus (per the quote above) NA is returned for each element of x. In layman's language: For each element of x we don't know if we want it. Thus an unknown value is returned for each element.

As @r2evans points out: x[NA_integer_] returns only one NA because integer indices are not recycled. In layman's language: We want one value but don't know which one. Thus, one unknown value is returned.

like image 134
Roland Avatar answered Nov 04 '22 03:11

Roland