R: selecting rows that contain a given number of NAs

Question

I've a six column data frame with NAs. I wish to select only those rows which contain say maximum of three NAs. I could find the number of NAs using sum(is.na(my.df[,c(1:6)])), but was not able to select the subset of the data frame using 'subset' or any other function and the condition sum(is.na(log.df[,c(1:6)])) <=3 Eventually I wish to calculate the median of each of the selected rows.The sample data is shown below:

C1  C2  C3  C4  C5  C6
6.4 NA 6.1 6.2 NA NA
7.1 6.4 6.5 5.9 7 6.9
7.1 7 6.9 6.9 6.9 7
6.9 NA 6.9 NA 7.1 NA
6.8 NA 7.1 7.1 6.8 7.2
NA NA NA NA NA 6.4
NA NA NA NA NA 6.7

Thanks in advance

C1  C2  C3  C4  C5  C6
6.4 NA 6.1 6.2 NA NA
7.1 6.4 6.5 5.9 7 6.9
7.1 7 6.9 6.9 6.9 7
6.9 NA 6.9 NA 7.1 NA
6.8 NA 7.1 7.1 6.8 7.2
NA NA NA NA NA 6.4
NA NA NA NA NA 6.7

Thanks in advance

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

Use rowSums:

> mydf[rowSums(is.na(mydf)) <= 3, ]
   C1  C2  C3  C4  C5  C6
1 6.4  NA 6.1 6.2  NA  NA
2 7.1 6.4 6.5 5.9 7.0 6.9
3 7.1 7.0 6.9 6.9 6.9 7.0
4 6.9  NA 6.9  NA 7.1  NA
5 6.8  NA 7.1 7.1 6.8 7.2

Step-by-step:

How many NAs per row?

> rowSums(is.na(mydf))
[1] 3 0 0 3 1 5 5

How many of those are less than or equal to 3?

> rowSums(is.na(mydf)) <= 3
[1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE

And, R can use that to subset. It will keep the TRUE rows (1, 2, 3, 4, 5) and discard the FALSE ones (6, 7).

R: selecting rows that contain a given number of NAs

Tags:

r

The August

1 Answers

A5C1D2H2I1M1N2O1R2T1

Recent Activity

Donate For Us

R: selecting rows that contain a given number of NAs

Tags:

r

The August

1 Answers

A5C1D2H2I1M1N2O1R2T1

Related questions

Recent Activity

Donate For Us