Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: selecting rows that contain a given number of NAs

Tags:

r

I've a six column data frame with NAs. I wish to select only those rows which contain say maximum of three NAs. I could find the number of NAs using sum(is.na(my.df[,c(1:6)])), but was not able to select the subset of the data frame using 'subset' or any other function and the condition sum(is.na(log.df[,c(1:6)])) <=3 Eventually I wish to calculate the median of each of the selected rows.The sample data is shown below:

C1  C2  C3  C4  C5  C6
6.4 NA 6.1 6.2 NA NA
7.1 6.4 6.5 5.9 7 6.9
7.1 7 6.9 6.9 6.9 7
6.9 NA 6.9 NA 7.1 NA
6.8 NA 7.1 7.1 6.8 7.2
NA NA NA NA NA 6.4
NA NA NA NA NA 6.7

Thanks in advance

like image 583
The August Avatar asked Mar 22 '23 13:03

The August


1 Answers

Use rowSums:

> mydf[rowSums(is.na(mydf)) <= 3, ]
   C1  C2  C3  C4  C5  C6
1 6.4  NA 6.1 6.2  NA  NA
2 7.1 6.4 6.5 5.9 7.0 6.9
3 7.1 7.0 6.9 6.9 6.9 7.0
4 6.9  NA 6.9  NA 7.1  NA
5 6.8  NA 7.1 7.1 6.8 7.2

Step-by-step:

  • How many NAs per row?

    > rowSums(is.na(mydf))
    [1] 3 0 0 3 1 5 5
    
  • How many of those are less than or equal to 3?

    > rowSums(is.na(mydf)) <= 3
    [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE
    

And, R can use that to subset. It will keep the TRUE rows (1, 2, 3, 4, 5) and discard the FALSE ones (6, 7).

like image 64
A5C1D2H2I1M1N2O1R2T1 Avatar answered Mar 25 '23 02:03

A5C1D2H2I1M1N2O1R2T1