Is there an R library/function that would implement INTELLIGENT label placement in R plot? I tried some but they are all problematic - many labels are overlapping either each other or other points (or other objects in the plot, but I see that this is much harder to handle).
If not, is there any way how to COMFORTABLY help the algorithm with the label placement for particular problematic points? Most comfortable and efficient solution wanted.
You can play and test other possibilities with my reproducible example and see if you are able to achieve better results than I have:
# data
x = c(0.8846, 1.1554, 0.9317, 0.9703, 0.9053, 0.9454, 1.0146, 0.9012,
0.9055, 1.3307)
y = c(0.9828, 1.0329, 0.931, 1.3794, 0.9273, 0.9605, 1.0259, 0.9542,
0.9717, 0.9357)
ShortSci = c("MotAlb", "PruMod", "EriRub", "LusMeg", "PhoOch", "PhoPho",
"SaxRub", "TurMer", "TurPil", "TurPhi")
# basic plot
plot(x, y, asp=1)
abline(h = 1, col = "green")
abline(v = 1, col = "green")
For labelling, I then tried these possibilities, no one is really good:
this one is terrible:
text(x, y, labels = ShortSci, cex= 0.7, offset = 10)
this one is good if you don't want to place labels for all points, but just for the outliers, but still, the labels are often placed wrong:
identify(x, y, labels = ShortSci, cex = 0.7)
this one looked promissing but there is the problem of labels being too close to the points; I had to pad them with spaces but this doesn't help much:
require(maptools) pointLabel(x, y, labels = paste(" ", ShortSci, " ", sep=""), cex=0.7)
require(plotrix) thigmophobe.labels(x, y, labels = ShortSci, cex=0.7, offset=0.5)
require(calibrate)
textxy(x, y, labs=ShortSci, cx=0.7)
Thank you in advance!
EDIT: todo: try labcurve {Hmisc}.
To add the labels, we have text() , the first argument gives the X value of each point, the second argument the Y value (so R knows where to place the text) and the third argument is the corresponding label. The argument pos=1 is there to tell R to draw the label underneath the point; with pos=2 (etc.)
Labels for most fields are positioned immediately before the field, that is, for left-to-right languages, either to the left of the field or above it, and for right-to-left languages, to the right of the field or above it. Labels for radio buttons and checkboxes are positioned after the field.
First, here's the results of my solution to this problem:
I did this by hand in Preview (very basic PDF/image viewer on OS X) in just a few minutes. (Edit: The workflow was exactly what you'd expect: I saved the plot as a PDF from R, opened it in Preview and created textboxes with the desired labels (9pt Helvetica) and then just dragged them around with my mouse until they looked good. Then I exported to a PNG for uploading to SO.)
Now, before you succumb to the strong urge to down vote this into oblivion and leave snarky comments about how the point is to automate this process, hear me out!
Looking for algorithmic solutions is totally fine, and (IMHO) really interesting. But, to me, point labeling situations fall into roughly three categories:
text
) isn't that much effort.:climbing onto soapbox:
Since folks like us love automation, I think we often fall into the trap of thinking that nearly every aspect of producing a good statistical graphic ought to be automated. I respectfully (humbly!) disagree.
There is no perfectly general statistical plotting environment that automagically creates the picture you have in your head. Things like R, ggplot2, lattice etc. do most of the work; but that extra little bit of tweaking, adding a line here, adjusting a margin there, is probably better suited to a different tool.
:climbing down from soapbox:
I would also note that I think we could all come up with scatterplots with <10-15 points that will be nearly impossible to cleanly label, even by hand, and these will likely break any automatic solution someone comes up with.
Finally, I want to reiterate that I know this isn't the answer you're looking for. And I'm not saying that algorithmic attempts are useless or dumb. I up-voted this question, and will happily upvote interesting algorithmic solutions!
The reason I posted this answer is that I think this question ought to be the canonical "point labeling in R" question for future duplicates, and I think solutions involving hand-labeling deserve a seat at the table, that's all.
ggrepel
looks promising when applied to ggplot2
scatterplots.
# data
x = c(0.8846, 1.1554, 0.9317, 0.9703, 0.9053, 0.9454, 1.0146, 0.9012,
0.9055, 1.3307)
y = c(0.9828, 1.0329, 0.931, 1.3794, 0.9273, 0.9605, 1.0259, 0.9542,
0.9717, 0.9357)
ShortSci = c("MotAlb", "PruMod", "EriRub", "LusMeg", "PhoOch", "PhoPho",
"SaxRub", "TurMer", "TurPil", "TurPhi")
df <- data.frame(x = x, y = y, z = ShortSci)
library(ggplot2)
library(ggrepel)
ggplot(data = df, aes(x = x, y = y)) + theme_bw() +
geom_text_repel(aes(label = z),
box.padding = unit(0.45, "lines")) +
geom_point(colour = "green", size = 3)
Have you tried the directlabels package?
And, BTW, the pos and offset arguments can take vectors to allow you to get them in the right positions when there are a reasonable number of points in just a few runs of plot.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With