Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

geom_bar ggplot2 stacked, grouped bar plot with positive and negative values - pyramid plot

I don't even know how to describe the plot I am trying to generate properly, which is not a great start. I will first show you my data, and then try to explain / show images that have elements of it.

My data:

   strain condition count.up count.down
1    phbA  balanced      120       -102
2    phbA   limited      114       -319
3    phbB  balanced      122       -148
4    phbB   limited       97       -201
5   phbAB  balanced      268       -243
6   phbAB   limited      140       -189
7    phbC  balanced       55        -65
8    phbC   limited      104       -187
9    phaZ  balanced       99        -28
10   phaZ   limited      147       -205
11   bdhA  balanced      246       -159
12   bdhA   limited      143       -383
13  acsA2  balanced      491       -389
14  acsA2   limited      131       -295

I have seven samples, each in two conditions. For each of these samples, I have the number of genes that are down regulated, and the number of genes that are upregulated (count.down and count.up).

I want to plot this so that each sample is grouped; so phbA balanced is dodged beside phbA limited. Each bar would have a portion (representing the count.up #) in the positive side of the plot, and a portion (representing the count.down #) in the negative side of the plot.

I want the bars from the 'balanced' condition to be one colour, and the bars from the 'limited' condition to be another. Ideally, there would be two gradients of each colour (one for count.up and one for count.down), just to make a visual difference between the two parts of the bar.

Some images that have elements that I am trying to pull together:

I've also tried to apply some of the pieces of this stackoverflow example, but I can't figure out how to make it work for my data set. I like the pos v. neg bars here; a single bar that covers both, and the colour differentiation of it. This does not have the grouping of conditions for one sample, or the colour coding extra layer that differentiates condition

I have tried a bunch of things, and I just can't get it right. I think I am really struggling because a lot of geom_bar examples use count data, that the plot calculates itself, where as I am giving it direct count data. I don't seem to be able to successful make that differentiation in my code, when I transfer to stat= "identity" then everything gets messy. Any thoughts or suggestions would be very greatly appreciated!

Using the link suggested: So I've been playing around with that as a template, but I've gotten stuck.

df <- read.csv("countdata.csv", header=T) 
df.m <- melt(df, id.vars = c("strain", "condition")) 
ggplot(df.m, aes(condition)) + geom_bar(subset = ,(variable == "count.up"),    aes(y = value, fill = strain), stat = "identity") + geom_bar(subset = ,(variable == "count.down"), aes(y = -value, fill = strain), stat = "identity") + xlab("") + scale_y_continuous("Export - Import",formatter = "comma") 

when I try to run the ggplot line, it returned an error: could not find function ".". I realized that I did not have dplyr installed/loaded, so I did that. Then I played around a lot and ending up coming up with:

library(ggplot2)
library(reshape2)
library(dplyr)
library(plyr)

df <- read.csv("countdata.csv", header=T)
df.m <- melt(df, id.vars = c("strain", "condition"))

#this is what the df.m looks like now (if you look at my initial input df, I    just changed in the numbers in excel to all be positive). Included so you can see what the melt does
df.m =read.table(text = "
strain condition   variable value
1    phbA  balanced   count.up   120
2    phbA   limited   count.up   114
3    phbB  balanced   count.up   122
4    phbB   limited   count.up    97
5   phbAB  balanced   count.up   268
6   phbAB   limited   count.up   140
7    phbC  balanced   count.up    55
8    phbC   limited   count.up   104
9    phaZ  balanced   count.up    99
10   phaZ   limited   count.up   147
11   bdhA  balanced   count.up   246
12   bdhA   limited   count.up   143
13  acsA2  balanced   count.up   491
14  acsA2   limited   count.up   131
15   phbA  balanced count.down   102
16   phbA   limited count.down   319
17   phbB  balanced count.down   148
18   phbB   limited count.down   201
19  phbAB  balanced count.down   243
20  phbAB   limited count.down   189
21   phbC  balanced count.down    65
22   phbC   limited count.down   187
23   phaZ  balanced count.down    28
24   phaZ   limited count.down   205
25   bdhA  balanced count.down   159 
26   bdhA   limited count.down   383
27  acsA2  balanced count.down   389
28  acsA2   limited count.down   295", header = TRUE)

this plots by strain, the count.up and count.down value under both conditions

ggplot(df.m, aes(strain)) + geom_bar(subset = .(variable == "count.up"), aes(y = value, fill = condition), stat = "identity") + geom_bar(subset = .(variable == "count.down"), aes(y = -value, fill = condition), stat = "identity") + xlab("") 

#this adds a line break at zero
labels <- gsub("20([0-9]{2})M([0-9]{2})", "\\2\n\\1",
           df.m$strain)


#this adds a line break at zero to improve readability
last_plot() + geom_hline(yintercept = 0,colour = "grey90")

The one thing I have not been able to get working (unfortunately) is how to display the number representing the 'value' inside each bar box. I've gotten the numbers to display, but I cannot get them in the right place. I'm going a little crazy!

My data is the same as above; this is where my code is at

I have looked at a ton of examples showing labels using geom_text on dodged plots. I have been unable to implement any successfully. The closest I've gotten is as follows - any suggestions would be appreciated!

library(ggplot2)
library(reshape2)
library(plyr)
library(dplyr)
df <- read.csv("countdata.csv", header=T)
df.m <- melt(df, id.vars = c("strain", "condition"))
ggplot(df.m, aes(strain), ylim(-500:500)) + 
geom_bar(subset = .(variable == "count.up"), 
aes(y = value, fill = condition), stat = "identity", position = "dodge") +
geom_bar(subset = .(variable == "count.down"), 
aes(y = -value, fill = condition), stat = "identity", position = "dodge") + 
geom_hline(yintercept = 0,colour = "grey90")

last_plot() + geom_text(aes(strain, value, group=condition, label=label, ymax = 500, ymin= -500), position = position_dodge(width=0.9),size=4)

Which gives this:

enter image description here

Why will you not align!

I suspect that my issue has to do with how I actually plotted, or the fact that I am not telling the geom_text command properly how to position itself. Any thoughts?

like image 501
mdelow Avatar asked Jul 08 '16 14:07

mdelow


1 Answers

Try this. Just as you position the bars with two statements (one for positive, one for negative), position the text in the same way. Then, fine-tune their positioning (inside the bar, or outside the bar) using vjust. Also, there is no 'label' variable in the data frame; the label, I assume, is value.

library(ggplot2)

## Using your df.m data frame
ggplot(df.m, aes(strain), ylim(-500:500)) + 
geom_bar(data = subset(df.m, variable == "count.up"), 
   aes(y = value, fill = condition), stat = "identity", position = "dodge") +
geom_bar(data = subset(df.m, variable == "count.down"), 
   aes(y = -value, fill = condition), stat = "identity", position = "dodge") + 
geom_hline(yintercept = 0,colour = "grey90")


last_plot() + 
   geom_text(data = subset(df.m, variable == "count.up"), 
      aes(strain, value, group=condition, label=value),
        position = position_dodge(width=0.9), vjust = 1.5, size=4) +
    geom_text(data = subset(df.m, variable == "count.down"), 
      aes(strain, -value, group=condition, label=value),
        position = position_dodge(width=0.9), vjust = -.5, size=4) +
    coord_cartesian(ylim = c(-500, 500))

enter image description here

like image 101
Sandy Muspratt Avatar answered Nov 14 '22 23:11

Sandy Muspratt