Note: The title of this question has been edited to make it the canonical question for issues when plyr
functions mask their dplyr
counterparts. The rest of the question remains unchanged.
Suppose I have the following data:
dfx <- data.frame(
group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
sex = sample(c("M", "F"), size = 29, replace = TRUE),
age = runif(n = 29, min = 18, max = 54)
)
With the good old plyr
I can create a little table summarizing my data with the following code:
require(plyr)
ddply(dfx, .(group, sex), summarize,
mean = round(mean(age), 2),
sd = round(sd(age), 2))
The output look like this:
group sex mean sd
1 A F 49.68 5.68
2 A M 32.21 6.27
3 B F 31.87 9.80
4 B M 37.54 9.73
5 C F 40.61 15.21
6 C M 36.33 11.33
I'm trying to move my code to dplyr
and the %>%
operator. My code takes DF then group it by group and sex and then summarise it. That is:
dfx %>% group_by(group, sex) %>%
summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
But my output is:
mean sd
1 35.56 9.92
What am I doing wrong?
Groupby Function in R – group_by is used to group the dataframe in R. Dplyr package in R is provided with group_by() function which groups the dataframe by multiple columns with mean, sum and other functions like count, maximum and minimum.
dplyr is a new package which provides a set of tools for efficiently manipulating datasets in R. dplyr is the next iteration of plyr , focussing on only data frames. dplyr is faster, has a more consistent API and should be easier to use.
Group_by() function belongs to the dplyr package in the R programming language, which groups the data frames. Group_by() function alone will not give any output. It should be followed by summarise() function with an appropriate action to perform. It works similar to GROUP BY in SQL and pivot table in excel.
mutate() either changes an existing column or adds a new one. summarise() calculates a single value (per group). As you can see, in the first example, new column is added. In the second, I group by cyl and then create a summary with summarize for each group in cyl .
The problem here is that you are loading dplyr first and then plyr, so plyr's function summarise
is masking dplyr's function summarise
. When that happens you get this warning:
library(plyr)
Loading required package: plyr
------------------------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
------------------------------------------------------------------------------------------
Attaching package: ‘plyr’
The following objects are masked from ‘package:dplyr’:
arrange, desc, failwith, id, mutate, summarise, summarize
So in order for your code to work, either detach plyr detach(package:plyr)
or restart R and load plyr first and then dplyr (or load only dplyr):
library(dplyr)
dfx %>% group_by(group, sex) %>%
summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
Source: local data frame [6 x 4]
Groups: group
group sex mean sd
1 A F 41.51 8.24
2 A M 32.23 11.85
3 B F 38.79 11.93
4 B M 31.00 7.92
5 C F 24.97 7.46
6 C M 36.17 9.11
Or you can explicitly call dplyr's summarise in your code, so the right function will be called no matter how you load the packages:
dfx %>% group_by(group, sex) %>%
dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With