Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deal with ggplot2 and overlapping labels on a discrete axis

ggplot2 does not seem to have a built-in way of dealing with overplotting for text on scatter plots. However, I have a different situation where the labels are those on a discrete axis and I'm wondering if someone here has a better solution than what I've been doing.

Some example code:

library(ggplot2)

#some example data
test.data = data.frame(text = c("A full commitment's what I'm thinking of",
                                "History quickly crashing through your veins",
                                "And I take A deep breath and I get real high",
                                "And again, the Internet is not something that you just dump something on. It's not a big truck."),
                       mean = c(3.5, 3, 5, 4),
                       CI.lower = c(4, 3.5, 5.5, 4.5),
                       CI.upper = c(3, 2.5, 4.5, 3.5))

#plot
ggplot(test.data, aes_string(x = "text", y = "mean")) +
  geom_point(stat="identity") +
  geom_errorbar(aes(ymax = CI.upper, ymin = CI.lower), width = .1) +
  scale_x_discrete(labels = test.data$text, name = "")

enter image description here

So we see that the x-axis labels are on top of each other. Two solutions spring to mind: 1) abbreviating the labels, and 2) adding newlines to the labels. In many cases (1) will do, but in some cases it cannot be done. So I wrote a function for adding newlines (\n) every n'th characters to the strings to avoid overlapping names:

library(ggplot2)

#Inserts newlines into strings every N interval
new_lines_adder = function(test.string, interval){
  #length of str
  string.length = nchar(test.string)
  #split by N char intervals
  split.starts = seq(1,string.length,interval)
  split.ends = c(split.starts[-1]-1,nchar(test.string))
  #split it
  test.string = substring(test.string, split.starts, split.ends)
  #put it back together with newlines
  test.string = paste0(test.string,collapse = "\n")
  return(test.string)
}

#a user-level wrapper that also works on character vectors, data.frames, matrices and factors
add_newlines = function(x, interval) {
  if (class(x) == "data.frame" | class(x) == "matrix" | class(x) == "factor") {
    x = as.vector(x)
  }

  if (length(x) == 1) {
    return(new_lines_adder(x, interval))
  } else {
    t = sapply(x, FUN = new_lines_adder, interval = interval) #apply splitter to each
    names(t) = NULL #remove names
    return(t)
  }
}

#plot again
ggplot(test.data, aes_string(x = "text", y = "mean")) +
  geom_point(stat="identity") +
  geom_errorbar(aes(ymax = CI.upper, ymin = CI.lower), width = .1) +
  scale_x_discrete(labels = add_newlines(test.data$text, 20), name = "")

And the output is: enter image description here

Then one can spend some time playing with the interval size to avoid having too much white-space between labels.

If the number of labels vary, this kind of solution is not so good, as the optimal interval size changes. Also, because the normal font is not mono-spaced, the text of the labels have an effect on the width too, and so one has to take extra care in selecting a good interval (one can avoid this by using a mono-space font, but they are extra wide). Finally, the new_lines_adder() function is stupid in that it will split words into two in silly ways humans would not do. E.g. in the above it split "breath" into "br\nreath". One could re-write it to avoid this problem.

One can also decrease the font size, but this is a trade off with the readability and often decreasing the font size is unnecessary.

What is the best way of handling this kind of label overplotting?

like image 319
CoderGuy123 Avatar asked Jun 02 '15 14:06

CoderGuy123


People also ask

How do I fix overlapping labels in ggplot2?

To avoid overlapping labels in ggplot2, we use guide_axis() within scale_x_discrete().

How do I turn off axis labels in R?

When we create a plot in R, the Y-axis labels are automatically generated and if we want to remove those labels, the plot function can help us. For this purpose, we need to set ylab argument of plot function to blank as ylab="" and yaxt="n" to remove the axis title.

Which arguments can be used to add labels in Ggplot?

To alter the labels on the axis, add the code +labs(y= "y axis name", x = "x axis name") to your line of basic ggplot code. Note: You can also use +labs(title = "Title") which is equivalent to ggtitle .


1 Answers

I tried to put together a different version of new_lines_adder:

new_lines_adder = function(test.string, interval) {
   #split at spaces
   string.split = strsplit(test.string," ")[[1]]
   # get length of snippets, add one for space
   lens <- nchar(string.split) + 1
   # now the trick: split the text into lines with
   # length of at most interval + 1 (including the spaces)
   lines <- cumsum(lens) %/% (interval + 1)
   # construct the lines
   test.lines <- tapply(string.split,lines,function(line)
      paste0(paste(line,collapse=" "),"\n"),simplify = TRUE)
   # put everything into a single string
   result <- paste(test.lines,collapse="")
   return(result)
}

It splits lines only at spaces and makes sure that the lines contain at most the number of characters given by interval. With this, your plot looks as follows:

enter image description here

I wouldn't claim this to be the best way. It still ignores that not all characters have the same width. Maybe something better can be achieved using strwidth.

By the way: you can simplify add_newlines to the following:

add_newlines = function(x, interval) {

   # make sure, x is a character array   
   x = as.character(x)
   # apply splitter to each
   t = sapply(x, FUN = new_lines_adder, interval = interval,USE.NAMES=FALSE)
   return(t)
}

At the beginning, as.character makes sure you have a character string. It does not hurt to do that also, if you already got a character string, so there is no need for the if clause.

Also the next if clause is unnecessary: sapply works perfectly if x contains only one element. And you can suppress the names by setting USE.NAMES=FALSE, such that you don't need to remove the names in an additional line.

like image 165
Stibu Avatar answered Sep 19 '22 12:09

Stibu