I want to insert a new column into a data.frame, which value is TRUE when there is at least one missing value in the row and FALSE otherwise.
For that problem, apply
is a a perfect use case:
tab <- data.frame(a = 1:10, b = c(NA, letters[2:10]), c = c(LETTERS[1:9], NA))
tab$missing <- apply(tab, 1, function(x) any(is.na(x)))
However, I loaded the strict package, and got this error: apply() coerces X to a matrix so is dangerous to use with data frames.Please use lapply() instead.
I know that I can safely ignore this error, however, I was wondering if there was a way to code it using one of the tidyverse packages, in a simple manner. I tried unsuccessfully with dplyr:
tab %>%
rowwise() %>%
mutate(missing = any(is.na(.), na.rm = TRUE))
Apply functions are a family of functions in base R which allow you to repetitively perform an action on multiple chunks of data. An apply function is essentially a loop, but run faster than loops and often require less code.
You can use the apply() function to apply a function to each row in a matrix or data frame in R.
rowwise() allows you to compute on a data frame a row-at-a-time. This is most useful when a vectorised function doesn't exist. Most dplyr verbs preserve row-wise grouping.
If you want to avoid coercing to a matrix, you can use purrr::pmap
, which iterates across the elements of a list in parallel and passes them to a function:
library(tidyverse)
tab <- data_frame(a = 1:10,
b = c(NA, letters[2:10]),
c = c(LETTERS[1:9], NA))
tab %>% mutate(missing = pmap_lgl(., ~any(is.na(c(...)))))
#> # A tibble: 10 x 4
#> a b c missing
#> <int> <chr> <chr> <lgl>
#> 1 1 <NA> A TRUE
#> 2 2 b B FALSE
#> 3 3 c C FALSE
#> 4 4 d D FALSE
#> 5 5 e E FALSE
#> 6 6 f F FALSE
#> 7 7 g G FALSE
#> 8 8 h H FALSE
#> 9 9 i I FALSE
#> 10 10 j <NA> TRUE
In the function, c
is necessary to pull all the parameters passed to the function ...
into a vector, which can be passed to is.na
and collapsed with any
. The *_lgl
suffixed pmap
simplifies the result to a Boolean vector.
Note that while this avoids coercing to matrix, it will not necessarily be faster than approaches that do, as matrix operations are highly optimized in R. It may make more sense to explicitly coerce to a matrix, e.g.
tab %>% mutate(missing = rowSums(is.na(as.matrix(.))) > 0)
which returns the same thing.
This works for the example data:
library(tidyverse)
tab <- data_frame(a = 1:10,
b = c(NA, letters[2:10]),
c = c(LETTERS[1:9], NA))
tab_1 <- tab %>% mutate(missing = ifelse(is.na(b), TRUE, ifelse(is.na(c), TRUE, FALSE)))
> tab_1
a b c missing
1 1 <NA> A TRUE
2 2 b B FALSE
3 3 c C FALSE
4 4 d D FALSE
5 5 e E FALSE
6 6 f F FALSE
7 7 g G FALSE
8 8 h H FALSE
9 9 i I FALSE
10 10 j <NA> TRUE
You can use the complete.cases
function:
tab %>% mutate(missing = !complete.cases(.))
To remove rows with one or more NAs, use:
tab %>% filter(complete.cases(.))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With