I've a six column data frame with NAs. I wish to select only those rows which contain say maximum of three NAs.
I could find the number of NAs using sum(is.na(my.df[,c(1:6)])),
but was not able to select the subset of the data frame using 'subset' or any other function and the condition sum(is.na(log.df[,c(1:6)])) <=3
Eventually I wish to calculate the median of each of the selected rows.The sample data is shown below:
C1 C2 C3 C4 C5 C6
6.4 NA 6.1 6.2 NA NA
7.1 6.4 6.5 5.9 7 6.9
7.1 7 6.9 6.9 6.9 7
6.9 NA 6.9 NA 7.1 NA
6.8 NA 7.1 7.1 6.8 7.2
NA NA NA NA NA 6.4
NA NA NA NA NA 6.7
Thanks in advance
Use rowSums
:
> mydf[rowSums(is.na(mydf)) <= 3, ]
C1 C2 C3 C4 C5 C6
1 6.4 NA 6.1 6.2 NA NA
2 7.1 6.4 6.5 5.9 7.0 6.9
3 7.1 7.0 6.9 6.9 6.9 7.0
4 6.9 NA 6.9 NA 7.1 NA
5 6.8 NA 7.1 7.1 6.8 7.2
Step-by-step:
How many NA
s per row?
> rowSums(is.na(mydf))
[1] 3 0 0 3 1 5 5
How many of those are less than or equal to 3?
> rowSums(is.na(mydf)) <= 3
[1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE
And, R can use that to subset. It will keep the TRUE
rows (1, 2, 3, 4, 5) and discard the FALSE
ones (6, 7).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With