I have a data frame that looks like this:
#df ID DRUG FED AUC0t Tmax Cmax 1 1 0 100 5 20 2 1 1 200 6 25 3 0 1 NA 2 30 4 0 0 150 6 65
Ans so on. I want to summarize some statistics on AUC, Tmax and Cmax by drug DRUG
and FED STATUS FED
. I use dplyr. For example: for the AUC:
CI90lo <- function(x) quantile(x, probs=0.05, na.rm=TRUE) CI90hi <- function(x) quantile(x, probs=0.95, na.rm=TRUE) summary <- df %>% group_by(DRUG,FED) %>% summarize(mean=mean(AUC0t, na.rm=TRUE), low = CI90lo(AUC0t), high= CI90hi(AUC0t), min=min(AUC0t, na.rm=TRUE), max=max(AUC0t,na.rm=TRUE), sd= sd(AUC0t, na.rm=TRUE))
However, the output is not grouped by DRUG and FED. It gives only one line containing the statistics of all by not faceted on DRUG and FED.
Any idea why? and how can I make it do the right thing?
Groupby Function in R – group_by is used to group the dataframe in R. Dplyr package in R is provided with group_by() function which groups the dataframe by multiple columns with mean, sum and other functions like count, maximum and minimum.
Group_by() function belongs to the dplyr package in the R programming language, which groups the data frames. Group_by() function alone will not give any output. It should be followed by summarise() function with an appropriate action to perform. It works similar to GROUP BY in SQL and pivot table in excel.
The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.
summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row summarising all observations in the input.
I’ll begin by loading the dplyr package and then using the group by function. > library (dplyr) > mtcars %>% group_by (gear) This will override the existing groups and create new ones, and we can use the group by function in R to group by any of the characteristics.
Not sure if it's a recent addition, but I caught this recently when loading the two: You have loaded plyr after dplyr - this is likely to cause problems. If you need functions from both plyr and dplyr, please load plyr first, then dplyr: library (plyr); library (dplyr). This happens often with dplyr methods being overloaded.
In addition to dplyr, users often use ggplot and with it ggpubr functions. It is in fact, another common used package that has a few incompatibilities with dplyr. In the same way, as shown above you can use dplyr::package, but if it keeps not working, as it happened to me, just detaching the library it will be enough,
Direct R to call dplyr's functions directly. Good trick when one package interferes with another. Show activity on this post. In addition to dplyr, users often use ggplot and with it ggpubr functions. It is in fact, another common used package that has a few incompatibilities with dplyr.
I believe you've loaded plyr after dplyr, which is why you are getting an overall summary instead of a grouped summary.
This is what happens with plyr loaded last.
library(dplyr) library(plyr) df %>% group_by(DRUG,FED) %>% summarize(mean=mean(AUC0t, na.rm=TRUE), low = CI90lo(AUC0t), high= CI90hi(AUC0t), min=min(AUC0t, na.rm=TRUE), max=max(AUC0t,na.rm=TRUE), sd= sd(AUC0t, na.rm=TRUE)) mean low high min max sd 1 150 105 195 100 200 50
Now remove plyr and try again and you get the grouped summary.
detach(package:plyr) df %>% group_by(DRUG,FED) %>% summarize(mean=mean(AUC0t, na.rm=TRUE), low = CI90lo(AUC0t), high= CI90hi(AUC0t), min=min(AUC0t, na.rm=TRUE), max=max(AUC0t,na.rm=TRUE), sd= sd(AUC0t, na.rm=TRUE)) Source: local data frame [4 x 8] Groups: DRUG DRUG FED mean low high min max sd 1 0 0 150 150 150 150 150 NaN 2 0 1 NaN NA NA NA NA NaN 3 1 0 100 100 100 100 100 NaN 4 1 1 200 200 200 200 200 NaN
A variant of aosmith's answer that might help some folks out. Direct R to call dplyr's functions directly. Good trick when one package interferes with another.
df %>% dplyr::group_by(DRUG,FED) %>% dplyr::summarize(mean=mean(AUC0t, na.rm=TRUE), low = CI90lo(AUC0t), high= CI90hi(AUC0t), min=min(AUC0t, na.rm=TRUE), max=max(AUC0t,na.rm=TRUE), sd= sd(AUC0t, na.rm=TRUE))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With