I want to insert a new column into a data.frame, which value is TRUE when there is at least one missing value in the row and FALSE otherwise. For that problem, <code>apply</code> is a a perfect use case: <h3> EDIT - added example</h3> <pre class="prettyprint"><code>tab <- data.frame(a = 1:10, b = c(NA, letters[2:10]), c = c(LETTERS[1:9], NA)) tab$missing <- apply(tab, 1, function(x) any(is.na(x))) </code></pre> However, I loaded the strict package, and got this error: <code>apply() coerces X to a matrix so is dangerous to use with data frames.Please use lapply() instead.</code> I know that I can safely ignore this error, however, I was wondering if there was a way to code it using one of the tidyverse packages, in a simple manner. I tried unsuccessfully with dplyr: <pre class="prettyprint"><code>tab %>% rowwise() %>% mutate(missing = any(is.na(.), na.rm = TRUE)) </code></pre>

If you want to avoid coercing to a matrix, you can use <code>purrr::pmap</code>, which iterates across the elements of a list in parallel and passes them to a function: <pre class="prettyprint lang-r prettyprint-override"><code>library(tidyverse) tab <- data_frame(a = 1:10, b = c(NA, letters[2:10]), c = c(LETTERS[1:9], NA)) tab %>% mutate(missing = pmap_lgl(., ~any(is.na(c(...))))) #> # A tibble: 10 x 4 #> a b c missing #> <int> <chr> <chr> <lgl> #> 1 1 <NA> A TRUE #> 2 2 b B FALSE #> 3 3 c C FALSE #> 4 4 d D FALSE #> 5 5 e E FALSE #> 6 6 f F FALSE #> 7 7 g G FALSE #> 8 8 h H FALSE #> 9 9 i I FALSE #> 10 10 j <NA> TRUE </code></pre> In the function, <code>c</code> is necessary to pull all the parameters passed to the function <code>...</code> into a vector, which can be passed to <code>is.na</code> and collapsed with <code>any</code>. The <code>*_lgl</code> suffixed <code>pmap</code> simplifies the result to a Boolean vector. Note that while this avoids coercing to matrix, it will not necessarily be faster than approaches that do, as matrix operations are highly optimized in R. It may make more sense to explicitly coerce to a matrix, e.g. <pre class="prettyprint lang-r prettyprint-override"><code>tab %>% mutate(missing = rowSums(is.na(as.matrix(.))) > 0) </code></pre> which returns the same thing.

You can use the <code>complete.cases</code> function: <pre class="prettyprint"><code>tab %>% mutate(missing = !complete.cases(.)) </code></pre> To remove rows with one or more NAs, use: <pre class="prettyprint"><code>tab %>% filter(complete.cases(.)) </code></pre>

Equivalent of apply() by row in the tidyverse?

EDIT - added example

tab <- data.frame(a = 1:10, b = c(NA, letters[2:10]), c = c(LETTERS[1:9], NA))

tab$missing <- apply(tab, 1, function(x) any(is.na(x)))

However, I loaded the strict package, and got this error: apply() coerces X to a matrix so is dangerous to use with data frames.Please use lapply() instead.

I know that I can safely ignore this error, however, I was wondering if there was a way to code it using one of the tidyverse packages, in a simple manner. I tried unsuccessfully with dplyr:

tab %>% 
  rowwise() %>% 
  mutate(missing = any(is.na(.), na.rm = TRUE))

246

asked Jul 06 '17 14:07

Kevin Zarca

3 Answers

If you want to avoid coercing to a matrix, you can use purrr::pmap, which iterates across the elements of a list in parallel and passes them to a function:

library(tidyverse)

tab <- data_frame(a = 1:10, 
                  b = c(NA, letters[2:10]), 
                  c = c(LETTERS[1:9], NA))

tab %>% mutate(missing = pmap_lgl(., ~any(is.na(c(...)))))
#> # A tibble: 10 x 4
#>        a     b     c missing
#>    <int> <chr> <chr>   <lgl>
#>  1     1  <NA>     A    TRUE
#>  2     2     b     B   FALSE
#>  3     3     c     C   FALSE
#>  4     4     d     D   FALSE
#>  5     5     e     E   FALSE
#>  6     6     f     F   FALSE
#>  7     7     g     G   FALSE
#>  8     8     h     H   FALSE
#>  9     9     i     I   FALSE
#> 10    10     j  <NA>    TRUE

In the function, c is necessary to pull all the parameters passed to the function ... into a vector, which can be passed to is.na and collapsed with any. The *_lgl suffixed pmap simplifies the result to a Boolean vector.

Note that while this avoids coercing to matrix, it will not necessarily be faster than approaches that do, as matrix operations are highly optimized in R. It may make more sense to explicitly coerce to a matrix, e.g.

tab %>% mutate(missing = rowSums(is.na(as.matrix(.))) > 0)

which returns the same thing.

answered Oct 29 '22 14:10

alistaire

This works for the example data:

library(tidyverse)

tab <- data_frame(a = 1:10, 
                  b = c(NA, letters[2:10]), 
                  c = c(LETTERS[1:9], NA))

tab_1 <- tab %>% mutate(missing = ifelse(is.na(b), TRUE, ifelse(is.na(c), TRUE, FALSE)))

> tab_1
    a    b    c missing
1   1 <NA>    A    TRUE
2   2    b    B   FALSE
3   3    c    C   FALSE
4   4    d    D   FALSE
5   5    e    E   FALSE
6   6    f    F   FALSE
7   7    g    G   FALSE
8   8    h    H   FALSE
9   9    i    I   FALSE
10 10    j <NA>    TRUE

answered Oct 29 '22 15:10

Rory Shaw

You can use the complete.cases function:

tab %>% mutate(missing = !complete.cases(.))

To remove rows with one or more NAs, use:

tab %>% filter(complete.cases(.))

answered Oct 29 '22 13:10

wint3rschlaefer

Related questions
                            
                                In knitr, no output from pander in for loop
                            
                                Broom/Dplyr error with glance() when using lm instead of biglm
                            
                                How to use Pearson Correlation as distance metric in Scikit-learn Agglomerative clustering
                            
                                ggplot function to add text just below legend
                            
                                Cannot insert plot into XLSX via openxlsx package when using command line
                            
                                solution to the warning message using glmer
                            
                                with_tz with a vector of timezones
                            
                                ggplot2: multiple plots in a single row with a single legend
                            
                                Building a binary sparkline plot in R with ggplot2 barplot
                            
                                Add (not merge!) two data frames with unequal rows and columns
                            
                                geom_smooth and exponential fits
                            
                                What exactly is the SEXP data type in R's C API and why is it used? [closed]
                            
                                Trouble installing tabulizer package
                            
                                Calculate average of last 3 non holiday weekdays
                            
                                Is it possible to use R Plotly library in R Script Visual of Power BI?
                            
                                Inherit Roxygen2 documentation for multiple arguments in R package
                            
                                Shinydashboard 'topbar'
                            
                                alpha and fill legends in ggplot2 boxplots?
                            
                                Pass a condition as a function parameter
                            
                                Make list content available in a function environment

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Equivalent of apply() by row in the tidyverse?

Tags:

r

dplyr

tidyverse