How to subset data in R without losing NA rows?

Tags:

I have some data that I am looking at in R. One particular column, titled "Height", contains a few rows of NA.

I am looking to subset my data-frame so that all Heights above a certain value are excluded from my analysis.

df2 <- subset ( df1 , Height < 40 )

However whenever I do this, R automatically removes all rows that contain NA values for Height. I do not want this. I have tried including arguments for na.rm

f1 <- function ( x , na.rm = FALSE ) {
df2 <- subset ( x , Height < 40 )
}
f1 ( df1 , na.rm = FALSE )

but this does not seem to do anything; the rows with NA still end up disappearing from my data-frame. Is there a way of subsetting my data as such, without losing the NA rows?

468

asked Nov 06 '16 05:11

Ryan Rothman

1 Answers

If we decide to use subset function, then we need to watch out:

For ordinary vectors, the result is simply ‘x[subset & !is.na(subset)]’.

So only non-NA values will be retained.

If you want to keep NA cases, use logical or condition to tell R not to drop NA cases:

subset(df1, Height < 40 | is.na(Height))
# or `df1[df1$Height < 40 | is.na(df1$Height), ]`

Don't use directly (to be explained soon):

df2 <- df1[df1$Height < 40, ]

Example

df1 <- data.frame(Height = c(NA, 2, 4, NA, 50, 60), y = 1:6)

subset(df1, Height < 40 | is.na(Height))

#  Height y
#1     NA 1
#2      2 2
#3      4 3
#4     NA 4

df1[df1$Height < 40, ]

#  Height  y
#1     NA NA
#2      2  2
#3      4  3
#4     NA NA

The reason that the latter fails, is that indexing by NA gives NA. Consider this simple example with a vector:

x <- 1:4
ind <- c(NA, TRUE, NA, FALSE)
x[ind]
# [1] NA  2 NA

We need to somehow replace those NA with TRUE. The most straightforward way is to add another "or" condition is.na(ind):

x[ind | is.na(ind)]
# [1] 1 2 3

This is exactly what will happen in your situation. If your Height contains NA, then logical operation Height < 40 ends up a mix of TRUE / FALSE / NA, so we need replace NA by TRUE as above.

answered Sep 28 '22 18:09

Zheyuan Li

Related questions
                            
                                Rolling Standard Deviation in a Matrix in R
                            
                                How to measure area between 2 distribution curves in R / ggplot2
                            
                                Using the result of summarise (dplyr) to mutate the original dataframe
                            
                                regex for preserving case pattern, capitalization
                            
                                Sleeping shinyapp on shinyapps.io
                            
                                How to match data from two tables with same primary key in R
                            
                                How can I write special characters in RMarkdown latex documents?
                            
                                Difference between runif and sample in R?
                            
                                How exactly are outliers removed in R boxplot and how can the same outliers be removed for further calculation (e.g. mean)?
                            
                                tm custom removePunctuation except hashtag
                            
                                How to merge two dataframes using multiple columns as key?
                            
                                How Can I manually obtain predict() values from coef/model.matrix returns on linear model
                            
                                How to simply multiply two columns of a dataframe? [duplicate]
                            
                                Find index of change in a column
                            
                                how to remove words of specific length in a string in R?
                            
                                R: How do I remove the first element from each inner element of a list without converting it to matrix?
                            
                                Arrange ggplot plots (grobs with same widths) using gtable to create 2x2 layout
                            
                                How to add a row names to a data frame in a magrittr chain
                            
                                Observe Event to Hide Action Button in Shiny
                            
                                How to have multiple groups in Python statsmodels linear mixed effects model?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to subset data in R without losing NA rows?

Tags:

dataframe

r

na

subset

Ryan Rothman

People also ask

1 Answers

Zheyuan Li

Recent Activity

Donate For Us