Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Value matching with NA - missing values - using mutate

Tags:

r

dplyr

I am somewhat stuck. Is there a better way than the below to do value matching considering NAs as "real values" within mutate?

library(dplyr)

data_foo <- data.frame(A= c(1:2, NA, 4, NA), B = c(1, 3, NA, NA, 4))

Not the desired output:

data_foo %>% mutate(irr = A==B)

#>    A  B   irr
#> 1  1  1  TRUE
#> 2  2  3 FALSE
#> 3 NA NA    NA
#> 4  4 NA    NA
#> 5 NA  4    NA

data_foo %>% rowwise() %>% mutate(irr = A%in%B)

#> Source: local data frame [5 x 3]
#> Groups: <by row>
#> 
#> # A tibble: 5 x 3
#>       A     B irr  
#>   <dbl> <dbl> <lgl>
#> 1     1     1 TRUE 
#> 2     2     3 FALSE
#> 3    NA    NA FALSE
#> 4     4    NA FALSE
#> 5    NA     4 FALSE

Desired output: The below shows the desired column, irr. I am using this somewhat cumbersome helper columns. Is there a shorter way?

data_foo %>% 
  mutate(NA_A = is.na(A), 
         NA_B = is.na(B), 
         irr = if_else(is.na(A)|is.na(B), NA_A == NA_B, A == B))

#>    A  B  NA_A  NA_B   irr
#> 1  1  1 FALSE FALSE  TRUE
#> 2  2  3 FALSE FALSE FALSE
#> 3 NA NA  TRUE  TRUE  TRUE
#> 4  4 NA FALSE  TRUE FALSE
#> 5 NA  4  TRUE FALSE FALSE
like image 723
tjebo Avatar asked Jun 17 '19 16:06

tjebo


2 Answers

Using map2

library(tidyverse)
data_foo %>%
   mutate(irr = map2_lgl(A, B, `%in%`))
#   A  B   irr
#1  1  1  TRUE
#2  2  3 FALSE
#3 NA NA  TRUE
#4  4 NA FALSE
#5 NA  4 FALSE

Or with setequal

data_foo %>% 
   rowwise %>%
   mutate(irr = setequal(A, B))

The above method is concise, but it is also loopy. We can replace the NA with a different value and then do the ==

data_foo %>%
     mutate_all(list(new = ~ replace_na(., -999))) %>%
     transmute(A, B, irr = A_new == B_new)
#   A  B   irr
#1  1  1  TRUE
#2  2  3 FALSE
#3 NA NA  TRUE
#4  4 NA FALSE
#5 NA  4 FALSE

Or with bind_cols and reduce

data_foo %>%
    mutate_all(replace_na, -999) %>% 
    reduce(`==`) %>% 
    bind_cols(data_foo, irr = .)
like image 114
akrun Avatar answered Sep 29 '22 02:09

akrun


Maybe simpler than akrun's answer?
Any of the two ways below will produce the expected result. Note that as.character won't do it, because the return value of as.character(NA) is NA_character_.

data_foo %>%
  mutate(irr = paste(A) == paste(B))

data_foo %>%
  mutate(irr = sQuote(A) == sQuote(B))

#Source: local data frame [5 x 3]
#Groups: <by row>
#
## A tibble: 5 x 3
#      A     B irr  
#  <dbl> <dbl> <lgl>
#1     1     1 TRUE 
#2     2     3 FALSE
#3    NA    NA TRUE 
#4     4    NA FALSE
#5    NA     4 FALSE

Edit.

  1. Following the comments below I have updated the code and it now follows akrun's suggestion.
  2. There is also the excellent idea in tmfmnk's answer. I use a similar one in yet another way of solving the question's problem.

The documentation of all.equal says that

Do not use all.equal directly in if expressions—either use isTRUE(all.equal(....)) or identical if appropriate.

Though there is no if expression in mutate, I believe that it is more stable than identical and has the same effect if the values being compared are (sort of/in fact) equal.

data_foo %>%
  mutate(irr = isTRUE(all.equal(A, B)))
like image 42
Rui Barradas Avatar answered Sep 29 '22 01:09

Rui Barradas