ggplot2 does not seem to have a built-in way of dealing with overplotting for text on scatter plots. However, I have a different situation where the labels are those on a discrete axis and I'm wondering if someone here has a better solution than what I've been doing.
Some example code:
library(ggplot2)
#some example data
test.data = data.frame(text = c("A full commitment's what I'm thinking of",
"History quickly crashing through your veins",
"And I take A deep breath and I get real high",
"And again, the Internet is not something that you just dump something on. It's not a big truck."),
mean = c(3.5, 3, 5, 4),
CI.lower = c(4, 3.5, 5.5, 4.5),
CI.upper = c(3, 2.5, 4.5, 3.5))
#plot
ggplot(test.data, aes_string(x = "text", y = "mean")) +
geom_point(stat="identity") +
geom_errorbar(aes(ymax = CI.upper, ymin = CI.lower), width = .1) +
scale_x_discrete(labels = test.data$text, name = "")
So we see that the x-axis labels are on top of each other. Two solutions spring to mind: 1) abbreviating the labels, and 2) adding newlines to the labels. In many cases (1) will do, but in some cases it cannot be done. So I wrote a function for adding newlines (\n
) every n'th characters to the strings to avoid overlapping names:
library(ggplot2)
#Inserts newlines into strings every N interval
new_lines_adder = function(test.string, interval){
#length of str
string.length = nchar(test.string)
#split by N char intervals
split.starts = seq(1,string.length,interval)
split.ends = c(split.starts[-1]-1,nchar(test.string))
#split it
test.string = substring(test.string, split.starts, split.ends)
#put it back together with newlines
test.string = paste0(test.string,collapse = "\n")
return(test.string)
}
#a user-level wrapper that also works on character vectors, data.frames, matrices and factors
add_newlines = function(x, interval) {
if (class(x) == "data.frame" | class(x) == "matrix" | class(x) == "factor") {
x = as.vector(x)
}
if (length(x) == 1) {
return(new_lines_adder(x, interval))
} else {
t = sapply(x, FUN = new_lines_adder, interval = interval) #apply splitter to each
names(t) = NULL #remove names
return(t)
}
}
#plot again
ggplot(test.data, aes_string(x = "text", y = "mean")) +
geom_point(stat="identity") +
geom_errorbar(aes(ymax = CI.upper, ymin = CI.lower), width = .1) +
scale_x_discrete(labels = add_newlines(test.data$text, 20), name = "")
And the output is:
Then one can spend some time playing with the interval size to avoid having too much white-space between labels.
If the number of labels vary, this kind of solution is not so good, as the optimal interval size changes. Also, because the normal font is not mono-spaced, the text of the labels have an effect on the width too, and so one has to take extra care in selecting a good interval (one can avoid this by using a mono-space font, but they are extra wide). Finally, the new_lines_adder()
function is stupid in that it will split words into two in silly ways humans would not do. E.g. in the above it split "breath" into "br\nreath". One could re-write it to avoid this problem.
One can also decrease the font size, but this is a trade off with the readability and often decreasing the font size is unnecessary.
What is the best way of handling this kind of label overplotting?
To avoid overlapping labels in ggplot2, we use guide_axis() within scale_x_discrete().
When we create a plot in R, the Y-axis labels are automatically generated and if we want to remove those labels, the plot function can help us. For this purpose, we need to set ylab argument of plot function to blank as ylab="" and yaxt="n" to remove the axis title.
To alter the labels on the axis, add the code +labs(y= "y axis name", x = "x axis name") to your line of basic ggplot code. Note: You can also use +labs(title = "Title") which is equivalent to ggtitle .
I tried to put together a different version of new_lines_adder
:
new_lines_adder = function(test.string, interval) {
#split at spaces
string.split = strsplit(test.string," ")[[1]]
# get length of snippets, add one for space
lens <- nchar(string.split) + 1
# now the trick: split the text into lines with
# length of at most interval + 1 (including the spaces)
lines <- cumsum(lens) %/% (interval + 1)
# construct the lines
test.lines <- tapply(string.split,lines,function(line)
paste0(paste(line,collapse=" "),"\n"),simplify = TRUE)
# put everything into a single string
result <- paste(test.lines,collapse="")
return(result)
}
It splits lines only at spaces and makes sure that the lines contain at most the number of characters given by interval
. With this, your plot looks as follows:
I wouldn't claim this to be the best way. It still ignores that not all characters have the same width. Maybe something better can be achieved using strwidth
.
By the way: you can simplify add_newlines
to the following:
add_newlines = function(x, interval) {
# make sure, x is a character array
x = as.character(x)
# apply splitter to each
t = sapply(x, FUN = new_lines_adder, interval = interval,USE.NAMES=FALSE)
return(t)
}
At the beginning, as.character
makes sure you have a character string. It does not hurt to do that also, if you already got a character string, so there is no need for the if
clause.
Also the next if
clause is unnecessary: sapply works perfectly if x contains only one element. And you can suppress the names by setting USE.NAMES=FALSE
, such that you don't need to remove the names in an additional line.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With