Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between dplyr:mutate and transform when using pmin and pmax?

Tags:

r

dplyr

While trying to answer this question, I encountered a difference between mutate and transform in what I expected to be equivalent operations.

# data
x <- data.frame(a=c(rep(0,10),rep(1,10),3),b=c(1:10,0,11:19,0))

#transform
transform(x,a=pmin(a,b), b=pmax(a,b))
   a  b
1  0  1
2  0  2
3  0  3
4  0  4
5  0  5
6  0  6
7  0  7
8  0  8
9  0  9
10 0 10
11 0  1
12 1 11
13 1 12
14 1 13
15 1 14
16 1 15
17 1 16
18 1 17
19 1 18
20 1 19
21 0  3

#mutate
libarary(dplyr)
x %>% mutate(a=pmin(a,b), b=pmax(a,b))
   a  b
1  0  1
2  0  2
3  0  3
4  0  4
5  0  5
6  0  6
7  0  7
8  0  8
9  0  9
10 0 10
11 0  0
12 1 11
13 1 12
14 1 13
15 1 14
16 1 15
17 1 16
18 1 17
19 1 18
20 1 19
21 0  0

Note the differences in lines 11 and 21. I suspect that mutate is mutating the data as it goes and therefore, pmax is not seeing the original data. Is this correct? Is it a bug, or by design?

like image 577
James Avatar asked Jul 14 '14 18:07

James


People also ask

What does dplyr mutate do?

mutate() is a dplyr function that adds new variables and preserves existing ones. That's what the documentation says. So when you want to add new variables or change one already in the dataset, that's your good ally. Given our dataset df , we can easily add columns with calculations.

Which package is mutate function?

In R programming, the mutate function is used to create a new variable from a data set. In order to use the function, we need to install the dplyr package, which is an add-on to R that includes a host of cool functions for selecting, filtering, grouping, and arranging data.


1 Answers

It appears my suspicions are correct, and that it is by design to allow the use of computed variables immediately afterwards, eg:

data.frame(a=1:4,b=5:8) %>% mutate(sum=a+b, letter=letters[sum])
  a b sum letter
1 1 5   6      f
2 2 6   8      h
3 3 7  10      j
4 4 8  12      l

In order to replicate the expected behaviour from transform one needs to simply reference the variable directly:

x %>% mutate(a=pmin(x$a,x$b), b=pmax(x$a,x$b))
   a  b
1  0  1
2  0  2
3  0  3
4  0  4
5  0  5
6  0  6
7  0  7
8  0  8
9  0  9
10 0 10
11 0  1
12 1 11
13 1 12
14 1 13
15 1 14
16 1 15
17 1 16
18 1 17
19 1 18
20 1 19
21 0  3
like image 66
James Avatar answered Dec 26 '22 07:12

James