How to deal with ggplot2 and overlapping labels on a discrete axis

Tags:

ggplot2 does not seem to have a built-in way of dealing with overplotting for text on scatter plots. However, I have a different situation where the labels are those on a discrete axis and I'm wondering if someone here has a better solution than what I've been doing.

Some example code:

library(ggplot2)

#some example data
test.data = data.frame(text = c("A full commitment's what I'm thinking of",
                                "History quickly crashing through your veins",
                                "And I take A deep breath and I get real high",
                                "And again, the Internet is not something that you just dump something on. It's not a big truck."),
                       mean = c(3.5, 3, 5, 4),
                       CI.lower = c(4, 3.5, 5.5, 4.5),
                       CI.upper = c(3, 2.5, 4.5, 3.5))

#plot
ggplot(test.data, aes_string(x = "text", y = "mean")) +
  geom_point(stat="identity") +
  geom_errorbar(aes(ymax = CI.upper, ymin = CI.lower), width = .1) +
  scale_x_discrete(labels = test.data$text, name = "")

enter image description here

So we see that the x-axis labels are on top of each other. Two solutions spring to mind: 1) abbreviating the labels, and 2) adding newlines to the labels. In many cases (1) will do, but in some cases it cannot be done. So I wrote a function for adding newlines (\n) every n'th characters to the strings to avoid overlapping names:

library(ggplot2)

#Inserts newlines into strings every N interval
new_lines_adder = function(test.string, interval){
  #length of str
  string.length = nchar(test.string)
  #split by N char intervals
  split.starts = seq(1,string.length,interval)
  split.ends = c(split.starts[-1]-1,nchar(test.string))
  #split it
  test.string = substring(test.string, split.starts, split.ends)
  #put it back together with newlines
  test.string = paste0(test.string,collapse = "\n")
  return(test.string)
}

#a user-level wrapper that also works on character vectors, data.frames, matrices and factors
add_newlines = function(x, interval) {
  if (class(x) == "data.frame" | class(x) == "matrix" | class(x) == "factor") {
    x = as.vector(x)
  }

  if (length(x) == 1) {
    return(new_lines_adder(x, interval))
  } else {
    t = sapply(x, FUN = new_lines_adder, interval = interval) #apply splitter to each
    names(t) = NULL #remove names
    return(t)
  }
}

#plot again
ggplot(test.data, aes_string(x = "text", y = "mean")) +
  geom_point(stat="identity") +
  geom_errorbar(aes(ymax = CI.upper, ymin = CI.lower), width = .1) +
  scale_x_discrete(labels = add_newlines(test.data$text, 20), name = "")

And the output is: enter image description here

Then one can spend some time playing with the interval size to avoid having too much white-space between labels.

If the number of labels vary, this kind of solution is not so good, as the optimal interval size changes. Also, because the normal font is not mono-spaced, the text of the labels have an effect on the width too, and so one has to take extra care in selecting a good interval (one can avoid this by using a mono-space font, but they are extra wide). Finally, the new_lines_adder() function is stupid in that it will split words into two in silly ways humans would not do. E.g. in the above it split "breath" into "br\nreath". One could re-write it to avoid this problem.

One can also decrease the font size, but this is a trade off with the readability and often decreasing the font size is unnecessary.

What is the best way of handling this kind of label overplotting?

319

asked Jun 02 '15 14:06

CoderGuy123

1 Answers

I tried to put together a different version of new_lines_adder:

new_lines_adder = function(test.string, interval) {
   #split at spaces
   string.split = strsplit(test.string," ")[[1]]
   # get length of snippets, add one for space
   lens <- nchar(string.split) + 1
   # now the trick: split the text into lines with
   # length of at most interval + 1 (including the spaces)
   lines <- cumsum(lens) %/% (interval + 1)
   # construct the lines
   test.lines <- tapply(string.split,lines,function(line)
      paste0(paste(line,collapse=" "),"\n"),simplify = TRUE)
   # put everything into a single string
   result <- paste(test.lines,collapse="")
   return(result)
}

It splits lines only at spaces and makes sure that the lines contain at most the number of characters given by interval. With this, your plot looks as follows:

enter image description here

I wouldn't claim this to be the best way. It still ignores that not all characters have the same width. Maybe something better can be achieved using strwidth.

By the way: you can simplify add_newlines to the following:

add_newlines = function(x, interval) {

   # make sure, x is a character array   
   x = as.character(x)
   # apply splitter to each
   t = sapply(x, FUN = new_lines_adder, interval = interval,USE.NAMES=FALSE)
   return(t)
}

At the beginning, as.character makes sure you have a character string. It does not hurt to do that also, if you already got a character string, so there is no need for the if clause.

Also the next if clause is unnecessary: sapply works perfectly if x contains only one element. And you can suppress the names by setting USE.NAMES=FALSE, such that you don't need to remove the names in an additional line.

165

answered Sep 19 '22 12:09

Stibu

Related questions
                            
                                R Packages for Limnology [closed]
                            
                                Challenge: Duplicating Many Eyes Word Tree with R
                            
                                reordering geom_bar when using facet_wrap
                            
                                foreach %dopar% uses sequential worker setup with PSock cluster?
                            
                                What is the best format in which to save data frames to disc in R for storage?
                            
                                Stop printing after n number of lines
                            
                                Checkbox on table or dataframe
                            
                                How to toggle roxygen comments in Rstudio?
                            
                                Converting R formula format to mathematical equation
                            
                                Using facet_grid and facet_wrap Together
                            
                                Why does peak memory usage increase when there are more elements to loop/apply over?
                            
                                Inconsistency with R's Global Environment in a function call
                            
                                in Q, how to speed up unicoin mining? [closed]
                            
                                How to access a return value of a function that is being traced
                            
                                ggplot2: How to get merge functionality of facet_grid()'s labeller=label_both and facet_wrap()'s ncol options?
                            
                                Print all significant digits in sprintf scientific notation
                            
                                Error BTYD: pnbd.EstimateParameters: L-BFGS-B needs finite values of 'fn'
                            
                                R: stack overflow error with randomForest on large dataset (48-512 GB RAM)
                            
                                How to convert a data frame of integer64 values to be a matrix?
                            
                                Specify Font type on R Markdown

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to deal with ggplot2 and overlapping labels on a discrete axis

Tags:

plot

r

ggplot2

axis-labels

CoderGuy123

People also ask

1 Answers

Stibu

Recent Activity

Donate For Us