Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Equivalent of apply() by row in the tidyverse?

Tags:

r

dplyr

tidyverse

I want to insert a new column into a data.frame, which value is TRUE when there is at least one missing value in the row and FALSE otherwise.

For that problem, apply is a a perfect use case:

EDIT - added example

tab <- data.frame(a = 1:10, b = c(NA, letters[2:10]), c = c(LETTERS[1:9], NA))

tab$missing <- apply(tab, 1, function(x) any(is.na(x)))

However, I loaded the strict package, and got this error: apply() coerces X to a matrix so is dangerous to use with data frames.Please use lapply() instead.

I know that I can safely ignore this error, however, I was wondering if there was a way to code it using one of the tidyverse packages, in a simple manner. I tried unsuccessfully with dplyr:

tab %>% 
  rowwise() %>% 
  mutate(missing = any(is.na(.), na.rm = TRUE))
like image 246
Kevin Zarca Avatar asked Jul 06 '17 14:07

Kevin Zarca


People also ask

What does apply () mean in R?

Apply functions are a family of functions in base R which allow you to repetitively perform an action on multiple chunks of data. An apply function is essentially a loop, but run faster than loops and often require less code.

How do I apply a function to each row of a DataFrame in R?

You can use the apply() function to apply a function to each row in a matrix or data frame in R.

What does rowwise () do in R?

rowwise() allows you to compute on a data frame a row-at-a-time. This is most useful when a vectorised function doesn't exist. Most dplyr verbs preserve row-wise grouping.


3 Answers

If you want to avoid coercing to a matrix, you can use purrr::pmap, which iterates across the elements of a list in parallel and passes them to a function:

library(tidyverse)

tab <- data_frame(a = 1:10, 
                  b = c(NA, letters[2:10]), 
                  c = c(LETTERS[1:9], NA))

tab %>% mutate(missing = pmap_lgl(., ~any(is.na(c(...)))))
#> # A tibble: 10 x 4
#>        a     b     c missing
#>    <int> <chr> <chr>   <lgl>
#>  1     1  <NA>     A    TRUE
#>  2     2     b     B   FALSE
#>  3     3     c     C   FALSE
#>  4     4     d     D   FALSE
#>  5     5     e     E   FALSE
#>  6     6     f     F   FALSE
#>  7     7     g     G   FALSE
#>  8     8     h     H   FALSE
#>  9     9     i     I   FALSE
#> 10    10     j  <NA>    TRUE

In the function, c is necessary to pull all the parameters passed to the function ... into a vector, which can be passed to is.na and collapsed with any. The *_lgl suffixed pmap simplifies the result to a Boolean vector.

Note that while this avoids coercing to matrix, it will not necessarily be faster than approaches that do, as matrix operations are highly optimized in R. It may make more sense to explicitly coerce to a matrix, e.g.

tab %>% mutate(missing = rowSums(is.na(as.matrix(.))) > 0)

which returns the same thing.

like image 87
alistaire Avatar answered Oct 29 '22 14:10

alistaire


This works for the example data:

library(tidyverse)

tab <- data_frame(a = 1:10, 
                  b = c(NA, letters[2:10]), 
                  c = c(LETTERS[1:9], NA))

tab_1 <- tab %>% mutate(missing = ifelse(is.na(b), TRUE, ifelse(is.na(c), TRUE, FALSE)))

> tab_1
    a    b    c missing
1   1 <NA>    A    TRUE
2   2    b    B   FALSE
3   3    c    C   FALSE
4   4    d    D   FALSE
5   5    e    E   FALSE
6   6    f    F   FALSE
7   7    g    G   FALSE
8   8    h    H   FALSE
9   9    i    I   FALSE
10 10    j <NA>    TRUE
like image 37
Rory Shaw Avatar answered Oct 29 '22 15:10

Rory Shaw


You can use the complete.cases function:

tab %>% mutate(missing = !complete.cases(.))

To remove rows with one or more NAs, use:

tab %>% filter(complete.cases(.))
like image 36
wint3rschlaefer Avatar answered Oct 29 '22 13:10

wint3rschlaefer