I have a data frame called data. I want to create a function f(data, collist). This function takes data and a list of columns from data itself, and returns only those rows from data, for which the mentioned column names in collist are not NA. I know it can be done using for loop, but I want to do it without using for loop.
Also, please let me know if it is generally more efficient in R to avoid loops.
Here is an example:
A B C D
1 2 NA NA
2 NA NA NA
NA 3 7 5
NA 4 2 NA
5 6 NA NA
If collist contains B and C, then a reduced data frame with row number 1,3,4 would be returned. The reason being either B or C or both has NA in rows 2 and 5. I want a function, because I will be using this operation quite a number of times. Through this question, I will learn some new R tricks, as well as, make my whole program more elegant. Thanks.
To select rows of an R data frame that are non-Na, we can use complete. cases function with single square brackets. For example, if we have a data frame called that contains some missing values (NA) then the selection of rows that are non-NA can be done by using the command df[complete. cases(df),].
To remove all rows having NA, we can use na. omit function. For Example, if we have a data frame called df that contains some NA values then we can remove all rows that contains at least one NA by using the command na. omit(df).
First, if we want to exclude missing values from mathematical operations use the na. rm = TRUE argument. If you do not exclude these values most functions will return an NA . We may also desire to subset our data to obtain complete observations, those observations (rows) in our data that contain no missing data.
Remove Rows with NA From R Dataframe. By using na. omit() , complete. cases() , rowSums() , and drop_na() methods you can remove rows that contain NA ( missing values) from R data frame.
It sounds like you are just looking for complete.cases
. Here's an example:
#### SAMPLE DATA
set.seed(1)
m <- matrix(rnorm(20), 5)
m[sample(length(m), 7)] <- NA
mydf <- data.frame(m)
mydf
# X1 X2 X3 X4
# 1 NA -0.8204684 1.511781 -0.04493361
# 2 0.1836433 0.4874291 NA NA
# 3 -0.8356286 0.7383247 NA 0.94383621
# 4 1.5952808 NA -2.214700 0.82122120
# 5 0.3295078 NA NA 0.59390132
#### SAMPLE EXTRACTION
collist <- c("X1", "X2")
mydf[complete.cases(mydf[collist]), collist]
# X1 X2
# 2 0.1836433 0.4874291
# 3 -0.8356286 0.7383247
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With