I have a data.frame that contains many columns. I want to keep the rows that have no NAs in 4 of these columns. The complication arises from the fact that I have other rows that are allowed have NAs in them so I can't use complete.cases or is.na. What's the most efficient way to do this?
By using na. omit() , complete. cases() , rowSums() , and drop_na() methods you can remove rows that contain NA ( missing values) from R data frame.
If we prefer to work with the Tidyverse package, we can use the filter() function to remove (or select) rows based on values in a column (conditionally, that is, and the same as using subset). Furthermore, we can also use the function slice() from dplyr to remove rows based on the index.
For example, if we have a data frame called df then we can remove rows that contain at least one 0 can be done by using the command df[apply(df,1, function(x) all(x!= 0)),].
You can still use complete.cases()
. Just apply it to the desired columns (columns 1:4 in the example below) and then use the Boolean vector it returns to select valid rows from the entire data.frame.
set.seed(4)
x <- as.data.frame(replicate(6, sample(c(1:10,NA))))
x[complete.cases(x[1:4]),]
# V1 V2 V3 V4 V5 V6
# 1 7 4 6 8 10 5
# 2 1 2 5 5 1 2
# 5 6 8 4 10 6 6
# 6 2 6 9 3 4 4
# 7 4 3 3 1 2 1
# 9 8 5 2 7 7 3
# 10 10 10 1 2 5 NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With