I have a large dataframe that has many rows and columns, and I would like to remove the rows for which at least 1 column is NA / NaN. Below is a small example of the dataframe I am working with:
team_id athlete_id GP tm_STL tm_TOV player_WS
1 13304 75047 1 2 8 NaN
2 13304 75048 1 2 8 0.28563827
3 13304 75049 1 2 8 NaN
4 13304 75050 1 2 8 NaN
5 13304 75053 1 2 8 0.03861989
6 13304 75060 1 2 8 -0.15530707
...albeit a bad example because all of the NaNs show up in the last column in this case. i am familiar with the approach of which(is.na(df$column_name))
for getting the rows with NA values from an individual column, but again want to do something like this for rows where at least 1 column in a row of a dataframe has an NA value.
Thanks!
Try using complete.cases
.
> df <- data.frame(col1 = c(1, 2, 3, NA, 5), col2 = c('A', 'B', NA, 'C', 'D'),
col3 = c(9, NaN, 8, 7, 6))
> df
col1 col2 col3
1 1 A 9
2 2 B NaN
3 3 <NA> 8
4 NA C 7
5 5 D 6
> df[complete.cases(df), ]
col1 col2 col3
1 1 A 9
5 5 D 6
You can use this.
df[rowSums(is.na(df))==0,]
# team_id athlete_id GP tm_STL tm_TOV player_WS
#2 13304 75048 1 2 8 0.28563827
#5 13304 75053 1 2 8 0.03861989
#6 13304 75060 1 2 8 -0.15530707
This way you count the number of NAs per row. You only keep the rows were the sum of non-NAs is zero.
na.omit
works:
na.omit(df)
## team_id athlete_id GP tm_STL tm_TOV player_WS
## 2 13304 75048 1 2 8 0.28563827
## 5 13304 75053 1 2 8 0.03861989
## 6 13304 75060 1 2 8 -0.15530707
It's a little more convenient than complete.cases
if you're piping, as it doesn't require another function to subset like dplyr::filter
, magrittr::extract
, or [
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With