Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - check if NA exists in any column of r dataframe row, then if so remove that row [duplicate]

Tags:

r

I have a large dataframe that has many rows and columns, and I would like to remove the rows for which at least 1 column is NA / NaN. Below is a small example of the dataframe I am working with:

  team_id athlete_id GP tm_STL tm_TOV   player_WS
1   13304      75047  1      2      8         NaN
2   13304      75048  1      2      8  0.28563827
3   13304      75049  1      2      8         NaN
4   13304      75050  1      2      8         NaN
5   13304      75053  1      2      8  0.03861989
6   13304      75060  1      2      8 -0.15530707

...albeit a bad example because all of the NaNs show up in the last column in this case. i am familiar with the approach of which(is.na(df$column_name)) for getting the rows with NA values from an individual column, but again want to do something like this for rows where at least 1 column in a row of a dataframe has an NA value.

Thanks!

like image 737
Canovice Avatar asked Aug 12 '16 18:08

Canovice


3 Answers

Try using complete.cases.

> df <- data.frame(col1 = c(1, 2, 3, NA, 5), col2 = c('A', 'B', NA, 'C', 'D'),
             col3 = c(9, NaN, 8, 7, 6))
> df
  col1 col2 col3
1    1    A    9
2    2    B  NaN
3    3 <NA>    8
4   NA    C    7
5    5    D    6
> df[complete.cases(df), ]
  col1 col2 col3
1    1    A    9
5    5    D    6
like image 62
Sam Avatar answered Oct 12 '22 01:10

Sam


You can use this.

df[rowSums(is.na(df))==0,]

#  team_id athlete_id GP tm_STL tm_TOV   player_WS
#2   13304      75048  1      2      8  0.28563827
#5   13304      75053  1      2      8  0.03861989
#6   13304      75060  1      2      8 -0.15530707

This way you count the number of NAs per row. You only keep the rows were the sum of non-NAs is zero.

like image 40
milan Avatar answered Oct 12 '22 00:10

milan


na.omit works:

na.omit(df)
##   team_id athlete_id GP tm_STL tm_TOV   player_WS
## 2   13304      75048  1      2      8  0.28563827
## 5   13304      75053  1      2      8  0.03861989
## 6   13304      75060  1      2      8 -0.15530707

It's a little more convenient than complete.cases if you're piping, as it doesn't require another function to subset like dplyr::filter, magrittr::extract, or [.

like image 41
alistaire Avatar answered Oct 12 '22 00:10

alistaire