Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional replacement using dplyr's mutate_all

Tags:

r

dplyr

library(tidyverse)
mytbl <- tibble(a = rep(c(1,1,0,1), 4), b= rep(c(1,0,0,1), 4))

    # A tibble: 16 × 2
           a     b
       <dbl> <dbl>
    1      1     1
    2      1     0
    3      0     0
    4      1     1
    5      1     1
    6      1     0
    7      0     0
    8      1     1
    9      1     1
    10     1     0
    11     0     0
    12     1     1
    13     1     1
    14     1     0
    15     0     0
    16     1     1

If I condition on the second column all is well

dplyr::mutate_all(mytbl, funs(replace(., b != 0, NA)))

    # A tibble: 16 × 2
           a     b
       <dbl> <dbl>
    1     NA    NA
    2      1     0
    3      0     0
    4     NA    NA
    5     NA    NA
    6      1     0
    7      0     0
    8     NA    NA
    9     NA    NA
    10     1     0
    11     0     0
    12    NA    NA
    13    NA    NA
    14     1     0
    15     0     0
    16    NA    NA

But if I condition on the first column only the first column is replaced

dplyr::mutate_all(mytbl, funs(replace(., a != 0, NA)))

    # A tibble: 16 × 2
           a     b
       <dbl> <dbl>
    1     NA     1
    2     NA     0
    3      0     0
    4     NA     1
    5     NA     1
    6     NA     0
    7      0     0
    8     NA     1
    9     NA     1
    10    NA     0
    11     0     0
    12    NA     1
    13    NA     1
    14    NA     0
    15     0     0
    16    NA     1

I am sure that I am doing something wrong in my approach and I could certainly do this a non-dplyr way, but it seems like this should work. You can extend this with more columns for a similar result.

like image 564
JustBob81 Avatar asked Oct 24 '16 16:10

JustBob81


1 Answers

I think (but have no proof ;)) this is because a gets altered and then the condition is re-checked. So when you do

dplyr::mutate_all(mytbl, funs(replace(., a != 0, NA)))

a gets mutated (so it no longer contains non-zero values) - then the condition a != 0 is re-evaluated but never returns TRUE. If you changed this to e.g.

dplyr::mutate_all(mytbl, funs(replace(., a > 0, 10)))

it would give the desired behaviour. You can try

dplyr::mutate_all(mytbl, funs(replace(., mytbl$a != 0, NA)))

which won't update the column a "on the fly" so will give the desired result.

like image 105
konvas Avatar answered Nov 15 '22 08:11

konvas