It's common to put stars on barplots or boxplots to show the level of significance (p-value) of one or between two groups, below are several examples: <img src="https://i.stack.imgur.com/HlVYs.png" alt="enter image description here"><img src="https://i.stack.imgur.com/WEVho.png" alt="enter image description here"><img src="https://i.stack.imgur.com/IAM1z.png" alt="enter image description here"> The number of stars are defined by p-value, for example one can put 3 stars for p-value < 0.001, two stars for p-value < 0.01, and so on (although this changes from one article to the other). And my questions: How to generate similar charts? The methods that automatically put stars based on significance level are more than welcome.

I know that this is an old question and the answer by Jens Tierling already provides one solution for the problem. But I recently created a ggplot-extension that simplifies the whole process of adding significance bars: ggsignif Instead of tediously adding the <code>geom_line</code> and <code>geom_text</code> to your plot you just add a single layer <code>geom_signif</code>: <pre class="prettyprint"><code>library(ggplot2) library(ggsignif) ggplot(iris, aes(x=Species, y=Sepal.Length)) + geom_boxplot() + geom_signif(comparisons = list(c("versicolor", "virginica")), map_signif_level=TRUE) </code></pre> <img src="https://i.stack.imgur.com/ztl3U.png" alt="Boxplot with significance bar"> To create a more advanced plot similar to the one shown by Jens Tierling, you can do: <pre class="prettyprint"><code>dat <- data.frame(Group = c("S1", "S1", "S2", "S2"), Sub = c("A", "B", "A", "B"), Value = c(3,5,7,8)) ggplot(dat, aes(Group, Value)) + geom_bar(aes(fill = Sub), stat="identity", position="dodge", width=.5) + geom_signif(stat="identity", data=data.frame(x=c(0.875, 1.875), xend=c(1.125, 2.125), y=c(5.8, 8.5), annotation=c("**", "NS")), aes(x=x,xend=xend, y=y, yend=y, annotation=annotation)) + geom_signif(comparisons=list(c("S1", "S2")), annotations="***", y_position = 9.3, tip_length = 0, vjust=0.4) + scale_fill_manual(values = c("grey80", "grey20")) </code></pre> <img src="https://i.stack.imgur.com/OPktO.png" alt="enter image description here"> Full documentation of the package is available at CRAN.

There is also an extension of the ggsignif package called ggpubr that is more powerful when it comes to multi-group comparisons. It builds on top of ggsignif, but also handles anova and kruskal-wallis as well as pairwise comparisons against the gobal mean. Example: <pre class="prettyprint"><code>library(ggpubr) my_comparisons = list( c("0.5", "1"), c("1", "2"), c("0.5", "2") ) ggboxplot(ToothGrowth, x = "dose", y = "len", color = "dose", palette = "jco")+ stat_compare_means(comparisons = my_comparisons, label.y = c(29, 35, 40))+ stat_compare_means(label.y = 45) </code></pre> <img src="https://i.stack.imgur.com/WXTxr.png" alt="enter image description here">

I found this one is useful. <pre class="prettyprint"><code>library(ggplot2) library(ggpval) data("PlantGrowth") plt <- ggplot(PlantGrowth, aes(group, weight)) + geom_boxplot() add_pval(plt, pairs = list(c(1, 3)), test='wilcox.test') </code></pre>

Made my own function: <pre class="prettyprint"><code>ts_test <- function(dataL,x,y,method="t.test",idCol=NULL,paired=F,label = "p.signif",p.adjust.method="none",alternative = c("two.sided", "less", "greater"),...) { options(scipen = 999) annoList <- list() setDT(dataL) if(paired) { allSubs <- dataL[,.SD,.SDcols=idCol] %>% na.omit %>% unique dataL <- dataL[,merge(.SD,allSubs,by=idCol,all=T),by=x] #idCol!!! } if(method =="t.test") { dataA <- eval(parse(text=paste0( "dataL[,.(",as.name(y),"=mean(get(y),na.rm=T),sd=sd(get(y),na.rm=T)),by=x] %>% setDF" ))) res<-pairwise.t.test(x=dataL[[y]], g=dataL[[x]], p.adjust.method = p.adjust.method, pool.sd = !paired, paired = paired, alternative = alternative, ...) } if(method =="wilcox.test") { dataA <- eval(parse(text=paste0( "dataL[,.(",as.name(y),"=median(get(y),na.rm=T),sd=IQR(get(y),na.rm=T,type=6)),by=x] %>% setDF" ))) res<-pairwise.wilcox.test(x=dataL[[y]], g=dataL[[x]], p.adjust.method = p.adjust.method, paired = paired, ...) } #Output the groups res$p.value %>% dimnames %>% {paste(.[[2]],.[[1]],sep="_")} %>% cat("Groups ",.) #Make annotations ready annoList[["label"]] <- res$p.value %>% diag %>% round(5) if(!is.null(label)) { if(label == "p.signif"){ annoList[["label"]] %<>% cut(.,breaks = c(-0.1, 0.0001, 0.001, 0.01, 0.05, 1), labels = c("****", "***", "**", "*", "ns")) %>% as.character } } annoList[["x"]] <- dataA[[x]] %>% {diff(.)/2 + .[-length(.)]} annoList[["y"]] <- {dataA[[y]] + dataA[["sd"]]} %>% {pmax(lag(.), .)} %>% na.omit #Make plot coli="#0099ff";sizei=1.3 p <-ggplot(dataA, aes(x=get(x), y=get(y))) + geom_errorbar(aes(ymin=len-sd, ymax=len+sd),width=.1,color=coli,size=sizei) + geom_line(color=coli,size=sizei) + geom_point(color=coli,size=sizei) + scale_color_brewer(palette="Paired") + theme_minimal() + xlab(x) + ylab(y) + ggtitle("title","subtitle") #Annotate significances p <-p + annotate("text", x = annoList[["x"]], y = annoList[["y"]], label = annoList[["label"]]) return(p) } </code></pre> <h3>Data and call:</h3> <pre class="prettyprint"><code>library(ggplot2);library(data.table);library(magrittr); df_long <- rbind(ToothGrowth[,-2],data.frame(len=40:50,dose=3.0)) df_long$ID <- data.table::rowid(df_long$dose) ts_test(dataL=df_long,x="dose",y="len",idCol="ID",method="wilcox.test",paired=T) </code></pre> <h3>Result:</h3> <img src="https://i.stack.imgur.com/UGJIU.jpg" alt="enter image description here">

Put stars on ggplot barplots and boxplots - to indicate the level of significance (p-value)

5 Answers

I know that this is an old question and the answer by Jens Tierling already provides one solution for the problem. But I recently created a ggplot-extension that simplifies the whole process of adding significance bars: ggsignif

Instead of tediously adding the geom_line and geom_text to your plot you just add a single layer geom_signif:

library(ggplot2) library(ggsignif)  ggplot(iris, aes(x=Species, y=Sepal.Length)) +    geom_boxplot() +   geom_signif(comparisons = list(c("versicolor", "virginica")),                map_signif_level=TRUE)

Boxplot with significance bar

To create a more advanced plot similar to the one shown by Jens Tierling, you can do:

dat <- data.frame(Group = c("S1", "S1", "S2", "S2"),               Sub   = c("A", "B", "A", "B"),               Value = c(3,5,7,8))    ggplot(dat, aes(Group, Value)) +   geom_bar(aes(fill = Sub), stat="identity", position="dodge", width=.5) +   geom_signif(stat="identity",               data=data.frame(x=c(0.875, 1.875), xend=c(1.125, 2.125),                               y=c(5.8, 8.5), annotation=c("**", "NS")),               aes(x=x,xend=xend, y=y, yend=y, annotation=annotation)) +   geom_signif(comparisons=list(c("S1", "S2")), annotations="***",               y_position = 9.3, tip_length = 0, vjust=0.4) +   scale_fill_manual(values = c("grey80", "grey20"))

enter image description here

Full documentation of the package is available at CRAN.

answered Sep 28 '22 05:09

const-ae

Please find my attempt below.

Example plot

First, I created some dummy data and a barplot which can be modified as we wish.

windows(4,4)  dat <- data.frame(Group = c("S1", "S1", "S2", "S2"),                   Sub   = c("A", "B", "A", "B"),                   Value = c(3,5,7,8))    ## Define base plot p <- ggplot(dat, aes(Group, Value)) +     theme_bw() + theme(panel.grid = element_blank()) +     coord_cartesian(ylim = c(0, 15)) +     scale_fill_manual(values = c("grey80", "grey20")) +     geom_bar(aes(fill = Sub), stat="identity", position="dodge", width=.5)

Adding asterisks above a column is easy, as baptiste already mentioned. Just create a data.frame with the coordinates.

label.df <- data.frame(Group = c("S1", "S2"),                        Value = c(6, 9))  p + geom_text(data = label.df, label = "***")

To add the arcs that indicate a subgroup comparison, I computed parametric coordinates of a half circle and added them connected with geom_line. Asterisks need new coordinates, too.

label.df <- data.frame(Group = c(1,1,1, 2,2,2),                        Value = c(6.5,6.8,7.1, 9.5,9.8,10.1))  # Define arc coordinates r <- 0.15 t <- seq(0, 180, by = 1) * pi / 180 x <- r * cos(t) y <- r*5 * sin(t)  arc.df <- data.frame(Group = x, Value = y)  p2 <- p + geom_text(data = label.df, label = "*") +     geom_line(data = arc.df, aes(Group+1, Value+5.5), lty = 2) +     geom_line(data = arc.df, aes(Group+2, Value+8.5), lty = 2)

Lastly, to indicate comparison between groups, I built a larger circle and flattened it at the top.

r <- .5 x <- r * cos(t) y <- r*4 * sin(t) y[20:162] <- y[20] # Flattens the arc  arc.df <- data.frame(Group = x, Value = y)  p2 + geom_line(data = arc.df, aes(Group+1.5, Value+11), lty = 2) +      geom_text(x = 1.5, y = 12, label = "***")

answered Sep 28 '22 06:09

Jens Tierling

There is also an extension of the ggsignif package called ggpubr that is more powerful when it comes to multi-group comparisons. It builds on top of ggsignif, but also handles anova and kruskal-wallis as well as pairwise comparisons against the gobal mean.

Example:

library(ggpubr)

my_comparisons = list( c("0.5", "1"), c("1", "2"), c("0.5", "2") )

ggboxplot(ToothGrowth, x = "dose", y = "len",
          color = "dose", palette = "jco")+ 
  stat_compare_means(comparisons = my_comparisons, label.y = c(29, 35, 40))+
  stat_compare_means(label.y = 45)

enter image description here

answered Sep 28 '22 06:09

Holger Brandl

I found this one is useful.

library(ggplot2)
library(ggpval)
data("PlantGrowth")
plt <- ggplot(PlantGrowth, aes(group, weight)) +
  geom_boxplot()
add_pval(plt, pairs = list(c(1, 3)), test='wilcox.test')

answered Sep 28 '22 04:09

Rashedul Islam

Made my own function:

ts_test <- function(dataL,x,y,method="t.test",idCol=NULL,paired=F,label = "p.signif",p.adjust.method="none",alternative = c("two.sided", "less", "greater"),...) {
    options(scipen = 999)

    annoList <- list()

    setDT(dataL)

    if(paired) {
        allSubs <- dataL[,.SD,.SDcols=idCol] %>% na.omit %>% unique
        dataL   <- dataL[,merge(.SD,allSubs,by=idCol,all=T),by=x]  #idCol!!!
    }

    if(method =="t.test") {
        dataA <- eval(parse(text=paste0(
                       "dataL[,.(",as.name(y),"=mean(get(y),na.rm=T),sd=sd(get(y),na.rm=T)),by=x] %>% setDF"
                       )))
        res<-pairwise.t.test(x=dataL[[y]], g=dataL[[x]], p.adjust.method = p.adjust.method,
                        pool.sd = !paired, paired = paired,
                        alternative = alternative, ...)
    }

    if(method =="wilcox.test") {
        dataA <- eval(parse(text=paste0(
            "dataL[,.(",as.name(y),"=median(get(y),na.rm=T),sd=IQR(get(y),na.rm=T,type=6)),by=x] %>% setDF"
        )))
        res<-pairwise.wilcox.test(x=dataL[[y]], g=dataL[[x]], p.adjust.method = p.adjust.method,
                             paired = paired, ...)
    }

    #Output the groups
    res$p.value %>% dimnames %>%  {paste(.[[2]],.[[1]],sep="_")} %>% cat("Groups ",.)

    #Make annotations ready
    annoList[["label"]] <- res$p.value %>% diag %>% round(5)

    if(!is.null(label)) {
        if(label == "p.signif"){
            annoList[["label"]] %<>% cut(.,breaks = c(-0.1, 0.0001, 0.001, 0.01, 0.05, 1),
                                         labels = c("****", "***", "**", "*", "ns")) %>% as.character
        }
    }

    annoList[["x"]] <- dataA[[x]] %>% {diff(.)/2 + .[-length(.)]}
    annoList[["y"]] <- {dataA[[y]] + dataA[["sd"]]} %>% {pmax(lag(.), .)} %>% na.omit

    #Make plot
    coli="#0099ff";sizei=1.3

    p <-ggplot(dataA, aes(x=get(x), y=get(y))) + 
        geom_errorbar(aes(ymin=len-sd, ymax=len+sd),width=.1,color=coli,size=sizei) +
        geom_line(color=coli,size=sizei) + geom_point(color=coli,size=sizei) + 
        scale_color_brewer(palette="Paired") + theme_minimal() +
        xlab(x) + ylab(y) + ggtitle("title","subtitle")


    #Annotate significances
    p <-p + annotate("text", x = annoList[["x"]], y = annoList[["y"]], label = annoList[["label"]])

    return(p)
}

Data and call:

library(ggplot2);library(data.table);library(magrittr);

df_long    <- rbind(ToothGrowth[,-2],data.frame(len=40:50,dose=3.0))
df_long$ID <- data.table::rowid(df_long$dose)

ts_test(dataL=df_long,x="dose",y="len",idCol="ID",method="wilcox.test",paired=T)

Result:

enter image description here

answered Sep 28 '22 06:09

Andre Elrico

Related questions
                            
                                How to remove repeated elements in a vector, similar to 'set' in Python
                            
                                What does %*% mean in R [duplicate]
                            
                                customize ggplot2 axis labels with different colors
                            
                                Merging multiple data.tables
                            
                                How to adjust the size of y axis labels only in R?
                            
                                How to check the OS within R [duplicate]
                            
                                Confusing error in R: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 42 elements) [duplicate]
                            
                                as.numeric with comma decimal separators?
                            
                                Unable to update R packages in default library on Windows 7
                            
                                Error converting text to lowercase with tm_map(..., tolower)
                            
                                How to initialize a vector with fixed length in R
                            
                                How to add whitespace to an RMarkdown document?
                            
                                How to debug "contrasts can be applied only to factors with 2 or more levels" error?
                            
                                Difference between subset and filter from dplyr
                            
                                Remove numbers from alphanumeric characters
                            
                                Get date difference in years (floating point)
                            
                                Calculating percentile of dataset column
                            
                                Dictionary style replace multiple items
                            
                                replacing NA's with 0's in R dataframe [duplicate]
                            
                                Replace contents of factor column in R dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Put stars on ggplot barplots and boxplots - to indicate the level of significance (p-value)

Tags:

r

ggplot2

bar-chart

boxplot

p-value

Ali

People also ask

5 Answers

const-ae

Jens Tierling

Holger Brandl

Rashedul Islam

Data and call:

Result:

Andre Elrico

Recent Activity

Donate For Us