I'm trying to label individual points of interest in a specific scatter plot in ggplot2. My data exists as a csv file with multiple columns.
Gene chr start stop A B C D E
APOBEC3G chr22 39472992 39483773 97.06 214.56 102.34 20.00 19.45
APOBEC3C ...
And so on and so forth. I am trying to plot column A v. column B via ggplot and I'm successful and can label all of the points with the corresponding gene name. However, how do I highlight (i.e. color, size change) individual genes of interest? (AKA: How do I make the data point for a list of 10 genes that I have on hand stand out? Or how can I annotate my genes of interest on the scatterplot without annotating all other points?)
I've tried using the subset
function but my novice character at R has left me stranded a bit.
To add labels at specified points use annotate() with annotate(geom = "text", ...) or annotate(geom = "label", ...) . To automatically position non-overlapping text labels see the ggrepel package.
To add the labels, we have text() , the first argument gives the X value of each point, the second argument the Y value (so R knows where to place the text) and the third argument is the corresponding label. The argument pos=1 is there to tell R to draw the label underneath the point; with pos=2 (etc.)
To put labels directly in the ggplot2 plot we add data related to the label in the data frame. Then we use functions geom_text() or geom_label() to create label beside every data point. Both the functions work the same with the only difference being in appearance.
You need to create a new variable that distinguishes the observations you want to highlight.
Let's simulate a data.frame :
df <- data.frame(genes=letters,
A=runif(26),
B=runif(26))
Your current plot should look like this (point + labels):
ggplot(data=df,aes(x=A,y=B,label=genes)) +
geom_point() +
geom_text(hjust=-1,vjust=1)
In order to highlight some genes, we create a new variable, group. I assign "important" to some arbitrary genes. You may want to do this programatically, by looking for outliers for instance.
df$group <- "not important"
df$group[df$genes %in% c("d","g","b")] <- "important"
Now, there are two ways to separate the genes. The most idiosyncratic is to give one colour (or shape, or size, etc.) to both groups (one for important genes, one for unimportant ones). This is easily achieved by mapping the new variable to colour (or size, shape, etc.):
ggplot(data=df,aes(x=A,y=B,label=genes)) +
geom_point(aes(color=group)) +
geom_text(hjust=-1,vjust=1)
However, you could also plot each group on a separate layer. To clearly highlight the important genes. In that case, we first add all points, and then add a new geom_point
that contains only the important genes, with special attributes (here, color and size).
ggplot(data=df,aes(x=A,y=B,label=genes)) +
geom_point() +
geom_point(data=df[df$group == "important",],color="red",size=3) +
geom_text(hjust=-1,vjust=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With