Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R ggplot2 ggrepel - label a subset of points while being aware of all points

I have a rather dense scatterplot that I am constructing with R 'ggplot2' and I want to label a subset of points using 'ggrepel'. My problem is that I want to plot ALL points in the scatterplot, but only label a subset with ggrepel, and when I do this, ggrepel doesn't account for the other points on the plot when calculating where to put the labels, which leads to labels which overlap other points on the plot (which I don't want to label).

Here is an example plot illustrating the issue.

# generate data:
library(data.table)
library(stringi)
set.seed(20180918)
dt = data.table(
  name = stri_rand_strings(3000,length=6),
  one = rnorm(n = 3000,mean = 0,sd = 1),
  two = rnorm(n = 3000,mean = 0,sd = 1))
dt[, diff := one -two]
dt[, diff_cat := ifelse(one > 0 & two>0 & abs(diff)>1, "type_1",
                        ifelse(one<0 & two < 0 & abs(diff)>1, "type_2",
                               ifelse(two>0 & one<0 & abs(diff)>1, "type_3",
                                      ifelse(two<0 & one>0 & abs(diff)>1, "type_4", "other"))))]

# make plot
ggplot(dt, aes(x=one,y=two,color=diff_cat))+
  geom_point()

plot without labels

If I plot only the subset of points I want labelled, then ggrepel is able to place all of the labels in a non-overlapping fashion with respect to other points and labels.

ggplot(dt[abs(diff)>2 & (!diff_cat %in% c("type_3","type_4","other"))], 
  aes(x=one,y=two,color=diff_cat))+
  geom_point()+
  geom_text_repel(data = dt[abs(diff)>2 & (!diff_cat %in% c("type_3","type_4","other"))], 
                  aes(x=one,y=two,label=name))

plot labelled points only

However when I want to plot this subset of data AND the original data at the same time, I get overlapping points with labels:

# now add labels to a subset of points on the plot
ggplot(dt, aes(x=one,y=two,color=diff_cat))+
  geom_point()+
  geom_text_repel(data = dt[abs(diff)>2 & (!diff_cat %in% c("type_3","type_4","other"))], 
                  aes(x=one,y=two,label=name))

plot with labels

How can I get the labels for the subset of points to not overlap the points from the original data?

like image 230
Reilstein Avatar asked Sep 19 '18 02:09

Reilstein


People also ask

How to create ggplot labels in R?

How to create ggplot labels in R 1 Creating a scatter plot with ggplot. The next group of code creates a ggplot scatter plot with that data, including sizing points by total county population and coloring them by ... 2 Focusing attention on subsets of data with ggrepel. ... 3 Customizing labels and lines with ggrepel. ...

What is the difference between ggplot and ggrepel?

The ggrepel package has its own versions of ggplot’s text and label geom functions: geom_text_repel () and geom_label_repel (). Using those functions’ defaults will automatically move one of the labels below its point so it doesn’t overlap with the other one.

How to repel overlapping text labels in ggplot2?

ggrepel provides geoms for ggplot2 to repel overlapping text labels: Text labels repel away from each other, away from data points, and away from edges of the plotting area (panel). The latest development version may have new features, and you can get it from GitHub:

How to use Stat_summary () with ggplot2 ggrepel?

We can use stat_summary () with geom = "text_repel". Note: When we use ggplot2::stat_summary () with ggrepel, we should prefer position_nudge_repel () instead of ggplot2::position_nudge (). The position_nudge_repel () function nudges the text label’s position, but it also remembers the original position of the data point.


1 Answers

You can try the following:

  1. Assign a blank label ("") to all the other points from the original data, so that geom_text_repel takes them into consideration when repelling labels from one another;
  2. Increase the box.padding parameter from the default 0.25 to some larger value, for greater distance between labels;
  3. Increase the x and y-axis limits, to give the labels more space at the four sides to repel towards.

Example code (with box.padding = 1):

ggplot(dt, 
       aes(x = one, y = two, color = diff_cat)) +
  geom_point() +
  geom_text_repel(data = . %>% 
                    mutate(label = ifelse(diff_cat %in% c("type_1", "type_2") & abs(diff) > 2,
                                          name, "")),
                  aes(label = label), 
                  box.padding = 1,
                  show.legend = FALSE) + #this removes the 'a' from the legend
  coord_cartesian(xlim = c(-5, 5), ylim = c(-5, 5)) +
  theme_bw()

plot

Here's another attempt, with box.padding = 2:

plot 2

(Note: I'm using ggrepel 0.8.0. I'm not sure if all the functionalities are present for earlier package versions.)

like image 193
Z.Lin Avatar answered Nov 13 '22 09:11

Z.Lin