Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

label specific point in ggplot2

Tags:

r

ggplot2

labels

I'm trying to label individual points of interest in a specific scatter plot in ggplot2. My data exists as a csv file with multiple columns.

Gene       chr    start    stop      A      B       C       D      E
APOBEC3G   chr22  39472992 39483773  97.06  214.56  102.34  20.00  19.45  
APOBEC3C ... 

And so on and so forth. I am trying to plot column A v. column B via ggplot and I'm successful and can label all of the points with the corresponding gene name. However, how do I highlight (i.e. color, size change) individual genes of interest? (AKA: How do I make the data point for a list of 10 genes that I have on hand stand out? Or how can I annotate my genes of interest on the scatterplot without annotating all other points?)

I've tried using the subset function but my novice character at R has left me stranded a bit.

like image 700
Matt Avatar asked Sep 17 '15 21:09

Matt


People also ask

How do I add a data point label in ggplot2?

To add labels at specified points use annotate() with annotate(geom = "text", ...) or annotate(geom = "label", ...) . To automatically position non-overlapping text labels see the ggrepel package.

How do I label a point in R?

To add the labels, we have text() , the first argument gives the X value of each point, the second argument the Y value (so R knows where to place the text) and the third argument is the corresponding label. The argument pos=1 is there to tell R to draw the label underneath the point; with pos=2 (etc.)

Which function is used to add labels to the graph in the Ggplot () function?

To put labels directly in the ggplot2 plot we add data related to the label in the data frame. Then we use functions geom_text() or geom_label() to create label beside every data point. Both the functions work the same with the only difference being in appearance.


1 Answers

You need to create a new variable that distinguishes the observations you want to highlight.

Let's simulate a data.frame :

df <- data.frame(genes=letters,
                 A=runif(26),
                 B=runif(26))

Your current plot should look like this (point + labels):

ggplot(data=df,aes(x=A,y=B,label=genes)) +
  geom_point() +
  geom_text(hjust=-1,vjust=1)

In order to highlight some genes, we create a new variable, group. I assign "important" to some arbitrary genes. You may want to do this programatically, by looking for outliers for instance.

df$group <- "not important"
df$group[df$genes %in% c("d","g","b")] <- "important"

Now, there are two ways to separate the genes. The most idiosyncratic is to give one colour (or shape, or size, etc.) to both groups (one for important genes, one for unimportant ones). This is easily achieved by mapping the new variable to colour (or size, shape, etc.):

ggplot(data=df,aes(x=A,y=B,label=genes)) +
  geom_point(aes(color=group)) +
  geom_text(hjust=-1,vjust=1)

enter image description here

However, you could also plot each group on a separate layer. To clearly highlight the important genes. In that case, we first add all points, and then add a new geom_point that contains only the important genes, with special attributes (here, color and size).

ggplot(data=df,aes(x=A,y=B,label=genes)) +
  geom_point() +
  geom_point(data=df[df$group == "important",],color="red",size=3) +
  geom_text(hjust=-1,vjust=1)

enter image description here

like image 98
scoa Avatar answered Nov 14 '22 20:11

scoa