Problem using rowwise() to count the number of NA's in each row of a dataframe

Question

I'm having trouble using rowwise() to count the number of NAs in each row. My minimal example:

df <- data.frame(Q1 = c(rep(1, 1), rep(NA, 9)),
                 Q2 = c(rep(2, 2), rep(NA, 8)),
                 Q3 = c(rep(3, 3), rep(NA, 7))
)
df
   Q1 Q2 Q3
1   1  2  3
2  NA  2  3
3  NA NA  3
4  NA NA NA
5  NA NA NA
6  NA NA NA
7  NA NA NA
8  NA NA NA
9  NA NA NA
10 NA NA NA

I would like to create a new column that counts the number of NAs in each row. I can do this very simply by writing

 df$Count_NA <- rowSums(is.na(df))
 df
   Q1 Q2 Q3 Count_NA
1   1  2  3        0
2  NA  2  3        1
3  NA NA  3        2
4  NA NA NA        3
5  NA NA NA        3
6  NA NA NA        3
7  NA NA NA        3
8  NA NA NA        3
9  NA NA NA        3
10 NA NA NA        3

But if I try and do this via dplyr using rowwise(), I get the wrong answer - the column Count_NA has the same number in each row:

df %>%
   rowwise() %>%
   mutate(Count_NA = sum(is.na(.)))
# A tibble: 10 x 4
# Rowwise: 
      Q1    Q2    Q3 Count_NA
   <dbl> <dbl> <dbl>    <int>
 1     1     2     3       24
 2    NA     2     3       24
 3    NA    NA     3       24
 4    NA    NA    NA       24
 5    NA    NA    NA       24
 6    NA    NA    NA       24
 7    NA    NA    NA       24
 8    NA    NA    NA       24
 9    NA    NA    NA       24
10    NA    NA    NA       24

what am I doing wrong, and how do i fix this?

Many thanks in advance

Thomas Philips

G. Grothendieck · Accepted Answer

Use cur_data() rather than dot. .[cur_group_id(), ], c(Q1, Q2, Q3), across() or c_across() (or c_across with the argument as per other answer) would also work.

Note that it is best to use ungroup afterwards or else it will retain the memory of the rowwise and you might get unexpected results later on.

df %>%
   rowwise() %>%
   mutate(Count_NA = sum(is.na(cur_data()))) %>%
   ungroup

giving:

# A tibble: 10 x 4
      Q1    Q2    Q3 Count_NA
   <dbl> <dbl> <dbl>    <int>
 1     1     2     3        0
 2    NA     2     3        1
 3    NA    NA     3        2
 4    NA    NA    NA        3
 5    NA    NA    NA        3
 6    NA    NA    NA        3
 7    NA    NA    NA        3
 8    NA    NA    NA        3
 9    NA    NA    NA        3
10    NA    NA    NA        3

r2evans · Answer

One issue is that . here resolves to the whole frame, not just the whole row. Another dplyr method, using c_across:

df %>%
    rowwise() %>%
    mutate(a=sum(is.na(c_across(everything()))))
# # A tibble: 10 x 4
# # Rowwise: 
#       Q1    Q2    Q3     a
#    <dbl> <dbl> <dbl> <int>
#  1     1     2     3     0
#  2    NA     2     3     1
#  3    NA    NA     3     2
#  4    NA    NA    NA     3
#  5    NA    NA    NA     3
#  6    NA    NA    NA     3
#  7    NA    NA    NA     3
#  8    NA    NA    NA     3
#  9    NA    NA    NA     3
# 10    NA    NA    NA     3

The biggest difference I can see between using this and cur_data() is that c_across allows for variable-selection a little more directly, as in c_across(starts_with("Q")). Granted, one could always select(cur_data(),...), so this is a weak argument.

AnilGoyal · Answer

baseR answer

df$Count_NA <- apply(df, 1, function(x) sum(is.na(x)))                 

df
   Q1 Q2 Q3 Count_NA
1   1  2  3        0
2  NA  2  3        1
3  NA NA  3        2
4  NA NA NA        3
5  NA NA NA        3
6  NA NA NA        3
7  NA NA NA        3
8  NA NA NA        3
9  NA NA NA        3
10 NA NA NA        3

So can be integrated into dplyr pipe

df %>% mutate(count_NA = apply(., 1, function(x) sum(is.na(x))))

   Q1 Q2 Q3 count_NA
1   1  2  3        0
2  NA  2  3        1
3  NA NA  3        2
4  NA NA NA        3
5  NA NA NA        3
6  NA NA NA        3
7  NA NA NA        3
8  NA NA NA        3
9  NA NA NA        3
10 NA NA NA        3

Waldi · Answer

rowSums works directly with mutate without rowwise:

df %>% mutate(count_NA = rowSums(is.na(.)))

   Q1 Q2 Q3 count_NA
1   1  2  3        0
2  NA  2  3        1
3  NA NA  3        2
4  NA NA NA        3
5  NA NA NA        3
6  NA NA NA        3
7  NA NA NA        3
8  NA NA NA        3
9  NA NA NA        3
10 NA NA NA        3

Note that your initial solution is by far the fastest one:

microbenchmark::microbenchmark(
  df$Count_NA <- rowSums(is.na(df)),
  df$Count_NA <- apply(df, 1, function(x) sum(is.na(x))),
  df %>% mutate(count_NA = rowSums(is.na(.))),
  df %>%
    mutate(Count_NA = purrr::pmap(., ~ sum(is.na(c(...))))),
  df %>%
    rowwise() %>%
    mutate(a=sum(is.na(c_across(everything())))),
  df %>%
  rowwise() %>%
  mutate(Count_NA = sum(is.na(cur_data()))) %>%
  ungroup
)

Unit: microseconds
                                                                            expr     min       lq
                                               df$Count_NA <- rowSums(is.na(df))    39.8    64.30
                          df$Count_NA <- apply(df, 1, function(x) sum(is.na(x)))  1661.6  1868.40
                                     df %>% mutate(count_NA = rowSums(is.na(.)))  1181.7  1572.80
                   df %>% mutate(Count_NA = purrr::pmap(., ~sum(is.na(c(...)))))  4749.9  5190.35
             df %>% rowwise() %>% mutate(a = sum(is.na(c_across(everything())))) 29124.1 31148.50
 df %>% rowwise() %>% mutate(Count_NA = sum(is.na(cur_data()))) %>%      ungroup 70473.0 73659.70
      mean   median       uq     max neval   cld
    79.033    76.25    88.75   174.0   100 a    
  2082.960  1966.50  2075.75  8777.3   100  b   
  1722.178  1676.20  1791.60  3112.9   100  b   
  5726.549  5396.40  5745.25 28592.1   100   c  
 33567.825 31983.05 33637.00 54676.9   100    d 
 77902.342 76492.85 81199.15 98942.1   100     e
Unit: microseconds
                                                                            expr     min       lq
                                               df$Count_NA <- rowSums(is.na(df))    38.2    44.95
                          df$Count_NA <- apply(df, 1, function(x) sum(is.na(x)))  1584.8  1765.30
                                     df %>% mutate(count_NA = rowSums(is.na(.)))  1247.9  1496.95
                   df %>% mutate(Count_NA = purrr::pmap(., ~sum(is.na(c(...)))))  4614.0  5110.50
 df %>% rowwise() %>% mutate(Count_NA = sum(is.na(cur_data()))) %>%      ungroup 67413.5 70865.45
      mean   median       uq      max neval cld
    71.159    65.85    84.40    162.2   100 a  
  1967.629  1894.45  2093.30   3436.6   100 ab 
  1814.193  1666.25  1895.35   9031.0   100 a  
  5796.483  5380.70  5665.10  15309.7   100  b 
 78309.807 75275.30 79776.40 286964.3   100   c

Problem using rowwise() to count the number of NA's in each row of a dataframe

Tags:

dataframe

r

dplyr

rowwise

Thomas Philips

4 Answers

G. Grothendieck

r2evans

AnilGoyal

Waldi

Recent Activity

Donate For Us

Problem using rowwise() to count the number of NA's in each row of a dataframe

Tags:

dataframe

r

dplyr

rowwise

Thomas Philips

4 Answers

G. Grothendieck

r2evans

AnilGoyal

Waldi

Related questions

Recent Activity

Donate For Us