label specific point in ggplot2

Tags:

I'm trying to label individual points of interest in a specific scatter plot in ggplot2. My data exists as a csv file with multiple columns.

Click to copy

Gene       chr    start    stop      A      B       C       D      E
APOBEC3G   chr22  39472992 39483773  97.06  214.56  102.34  20.00  19.45  
APOBEC3C ...

And so on and so forth. I am trying to plot column A v. column B via ggplot and I'm successful and can label all of the points with the corresponding gene name. However, how do I highlight (i.e. color, size change) individual genes of interest? (AKA: How do I make the data point for a list of 10 genes that I have on hand stand out? Or how can I annotate my genes of interest on the scatterplot without annotating all other points?)

I've tried using the subset function but my novice character at R has left me stranded a bit.

700

asked Sep 17 '15 21:09

Matt

1 Answers

You need to create a new variable that distinguishes the observations you want to highlight.

Let's simulate a data.frame :

Click to copy

df <- data.frame(genes=letters,
                 A=runif(26),
                 B=runif(26))

Your current plot should look like this (point + labels):

Click to copy

ggplot(data=df,aes(x=A,y=B,label=genes)) +
  geom_point() +
  geom_text(hjust=-1,vjust=1)

In order to highlight some genes, we create a new variable, group. I assign "important" to some arbitrary genes. You may want to do this programatically, by looking for outliers for instance.

Click to copy

df$group <- "not important"
df$group[df$genes %in% c("d","g","b")] <- "important"

Now, there are two ways to separate the genes. The most idiosyncratic is to give one colour (or shape, or size, etc.) to both groups (one for important genes, one for unimportant ones). This is easily achieved by mapping the new variable to colour (or size, shape, etc.):

Click to copy

ggplot(data=df,aes(x=A,y=B,label=genes)) +
  geom_point(aes(color=group)) +
  geom_text(hjust=-1,vjust=1)

enter image description here

However, you could also plot each group on a separate layer. To clearly highlight the important genes. In that case, we first add all points, and then add a new geom_point that contains only the important genes, with special attributes (here, color and size).

Click to copy

ggplot(data=df,aes(x=A,y=B,label=genes)) +
  geom_point() +
  geom_point(data=df[df$group == "important",],color="red",size=3) +
  geom_text(hjust=-1,vjust=1)

enter image description here

answered Nov 14 '22 20:11

scoa

Related questions
                            
                                Reordering rows in a data.frame?
                            
                                R Shiny list2env
                            
                                Web scraping the make/model/year of VIN numbers in RStudio
                            
                                R - two data frame columns to list of key-value pairs
                            
                                Subset rows based on a specific threshold value
                            
                                Convert rows to one based on a common name [duplicate]
                            
                                Extract the level from a factor
                            
                                Multiple T-test in R
                            
                                Summary statistics in glmnet
                            
                                dcast without ID variables
                            
                                Prevent column name wrap in shiny DataTable
                            
                                Find a submatrix in a matrix
                            
                                un-intersect values in R
                            
                                Using variable in data.table group by clause
                            
                                R: set duplicate 'row.names' to a numeric data frame
                            
                                How do I use the addGeoJSON() feature in R for Leaflet?
                            
                                R - Counting the number of a specific value in bins
                            
                                Split string in each column for several columns
                            
                                Executing a batch file in an R script
                            
                                R Find the frequency and duration a wave is above a given value using conditional in data.table

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

label specific point in ggplot2

Tags:

r

ggplot2

labels

Matt

People also ask

1 Answers

scoa

Recent Activity

Donate For Us