Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are my dplyr group_by & summarize not working properly? (name-collision with plyr)

I have a data frame that looks like this:

#df ID  DRUG FED  AUC0t  Tmax   Cmax 1    1     0   100     5      20 2    1     1   200     6      25 3    0     1   NA      2      30  4    0     0   150     6      65 

Ans so on. I want to summarize some statistics on AUC, Tmax and Cmax by drug DRUG and FED STATUS FED. I use dplyr. For example: for the AUC:

CI90lo <- function(x) quantile(x, probs=0.05, na.rm=TRUE) CI90hi <- function(x) quantile(x, probs=0.95, na.rm=TRUE)    summary <- df %>%              group_by(DRUG,FED) %>%              summarize(mean=mean(AUC0t, na.rm=TRUE),                                   low = CI90lo(AUC0t),                                   high= CI90hi(AUC0t),                                  min=min(AUC0t, na.rm=TRUE),                                  max=max(AUC0t,na.rm=TRUE),                                   sd= sd(AUC0t, na.rm=TRUE)) 

However, the output is not grouped by DRUG and FED. It gives only one line containing the statistics of all by not faceted on DRUG and FED.

Any idea why? and how can I make it do the right thing?

like image 751
Amer Avatar asked Nov 14 '14 06:11

Amer


People also ask

What does Group_by in R do?

Groupby Function in R – group_by is used to group the dataframe in R. Dplyr package in R is provided with group_by() function which groups the dataframe by multiple columns with mean, sum and other functions like count, maximum and minimum.

What is the purpose of Group_by () function?

Group_by() function belongs to the dplyr package in the R programming language, which groups the data frames. Group_by() function alone will not give any output. It should be followed by summarise() function with an appropriate action to perform. It works similar to GROUP BY in SQL and pivot table in excel.

Can you group by multiple columns in Dplyr?

The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.

What does summarize in Dplyr do?

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row summarising all observations in the input.

How do you use group by in dplyr?

I’ll begin by loading the dplyr package and then using the group by function. > library (dplyr) > mtcars %>% group_by (gear) This will override the existing groups and create new ones, and we can use the group by function in R to group by any of the characteristics.

Why can't I use plyr and dplyr together?

Not sure if it's a recent addition, but I caught this recently when loading the two: You have loaded plyr after dplyr - this is likely to cause problems. If you need functions from both plyr and dplyr, please load plyr first, then dplyr: library (plyr); library (dplyr). This happens often with dplyr methods being overloaded.

Is it possible to use ggpubr functions with dplyr?

In addition to dplyr, users often use ggplot and with it ggpubr functions. It is in fact, another common used package that has a few incompatibilities with dplyr. In the same way, as shown above you can use dplyr::package, but if it keeps not working, as it happened to me, just detaching the library it will be enough,

How to call dplyr functions from another package?

Direct R to call dplyr's functions directly. Good trick when one package interferes with another. Show activity on this post. In addition to dplyr, users often use ggplot and with it ggpubr functions. It is in fact, another common used package that has a few incompatibilities with dplyr.


Video Answer


2 Answers

I believe you've loaded plyr after dplyr, which is why you are getting an overall summary instead of a grouped summary.

This is what happens with plyr loaded last.

library(dplyr) library(plyr) df %>%       group_by(DRUG,FED) %>%       summarize(mean=mean(AUC0t, na.rm=TRUE),                  low = CI90lo(AUC0t),                   high= CI90hi(AUC0t),                  min=min(AUC0t, na.rm=TRUE),                  max=max(AUC0t,na.rm=TRUE),                   sd= sd(AUC0t, na.rm=TRUE))    mean low high min max sd 1  150 105  195 100 200 50 

Now remove plyr and try again and you get the grouped summary.

detach(package:plyr) df %>%       group_by(DRUG,FED) %>%       summarize(mean=mean(AUC0t, na.rm=TRUE),                  low = CI90lo(AUC0t),                   high= CI90hi(AUC0t),                  min=min(AUC0t, na.rm=TRUE),                  max=max(AUC0t,na.rm=TRUE),                   sd= sd(AUC0t, na.rm=TRUE))  Source: local data frame [4 x 8] Groups: DRUG    DRUG FED mean low high min max  sd 1    0   0  150 150  150 150 150 NaN 2    0   1  NaN  NA   NA  NA  NA NaN 3    1   0  100 100  100 100 100 NaN 4    1   1  200 200  200 200 200 NaN 
like image 97
aosmith Avatar answered Oct 12 '22 17:10

aosmith


A variant of aosmith's answer that might help some folks out. Direct R to call dplyr's functions directly. Good trick when one package interferes with another.

df %>%       dplyr::group_by(DRUG,FED) %>%       dplyr::summarize(mean=mean(AUC0t, na.rm=TRUE),                  low = CI90lo(AUC0t),                   high= CI90hi(AUC0t),                  min=min(AUC0t, na.rm=TRUE),                  max=max(AUC0t,na.rm=TRUE),                   sd= sd(AUC0t, na.rm=TRUE)) 
like image 33
mmann1123 Avatar answered Oct 12 '22 18:10

mmann1123