I'm trying to filter by NAs (just keep the rows with NA in the specified column) by using Dplyr and the filter function. Using the code below, is just returning the column labels with no data. Am I writing the code correctly? Also, if it's possible (or easier) to do without dplyr that'd be interesting to know as well. Thanks.
filter(tata4, CompleteSolution == "NA", KeptInformed == "NA")
You could use complete.cases()
dplyr::filter(df, !complete.cases(col1, col2))
Which gives:
#  col1 col2 col3
#1   NA    5    5
#2   NA    6    6
#3    5   NA    7
Benchmark
large_df <- df[rep(seq_len(nrow(df)), 10e5), ]
The results so far:
library(microbenchmark)
mbm <- microbenchmark(
  akrun1 = large_df[rowSums(is.na(large_df[1:2]))!=0, ],
  akrun2 = large_df[Reduce(`|`, lapply(large_df[1:2], is.na)), ],
  steven = filter(large_df, !complete.cases(col1, col2)),
  times = 10)

#Unit: milliseconds
#   expr      min       lq      mean    median        uq       max neval cld
# akrun1 814.0226 924.0837 1248.9911 1208.7924 1434.2415 2057.1338    10   c
# akrun2 499.3404 671.9900  736.2418  687.9194  861.4477 1068.1232    10  b 
# steven 112.9394 113.0604  214.1688  198.4542  299.7585  355.1795    10 a 
Data
df <- structure(list(col1 = c(1, 2, 3, 4, NA, NA, 5), col2 = c(1, 2, 
3, 4, 5, 6, NA), col3 = c(1, 2, 3, 4, 5, 6, 7)), .Names = c("col1", 
"col2", "col3"), row.names = c(NA, -7L), class = "data.frame")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With