Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Visualizing two or more data points where they overlap (ggplot R)

I have a scatterplot that has colour-coded data points. When two or more of the data points overlap only one of the colours is shown (whichever is first in the legend). Each of these data points represents an item and I need to show which items fall at each point on the scale. I'm using R (v.3.3.1). Would anyone have any suggestions as per how I could show that there are multiple items at each point on the scatterplot? Thanks in advance.

pdf('pedplot.pdf', height = 6, width = 10)
p3 <- ggplot(data=e4, aes(x=e4$domain, y=e4$ped)) + geom_point(aes(color = 
    e4$Database_acronym), size = 3, shape = 17) + 
    labs(x = "Domains", y = "Proportion of Elements per Domain", color = "Data 
    Sources") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) 
p3 dev.off();
like image 337
Cate Avatar asked Dec 23 '17 19:12

Cate


People also ask

How do you avoid overlapping points in ggplot2?

Try geom_point(aes(color = e4$Database_acronym), position = "jitter", size = 3, shape = 17) . This adds a little bit of random variation to your scatter plot and thereby prevents overplotting.

What does geom_point () do in R?

The function geom_point() adds a layer of points to your plot, which creates a scatterplot.

How do you deal with Overplotting in R?

Fixes for overplotting include reducing the size of points, changing the shape of points, jittering, tiling, making points transparent, only showing a subset of points, and using algorithms to prevent labels from overlapping.

Which application of ggplot2 helps to reduce Overplotting of points?

Jitter points to avoid overplotting.


1 Answers

You could jitter the points, meaning add a bit of noise to remove the overlap (probably the most commonly used option). Another option, would be to use different marker shapes (plus a small size adjustment) chosen so that the markers will be visible when plotted on top of each other. This will work if you have only two or three different marker types. A third option is to vary the size for each color, once again only for cases with maybe two or three colors/sizes, though the size difference might be confusing. If you can have multiple points of the same color with the same coordinates, then only jitter (among the three options above) will make that apparent. In any case, here are examples of each approach:

dat = data.frame(x=1:5, y=rep(1:5,3), group=rep(LETTERS[1:3],each=5))
theme_set(theme_bw())

# Jitter
set.seed(3)
ggplot(dat, aes(x,y, colour=group)) +
  geom_point(size=3, position=position_jitter(h=0.15,w=0.15))

# Vary the marker size
ggplot(dat, aes(x,y, colour=group,size=group)) +
  geom_point() +
  scale_color_manual(values=c("red","blue","orange")) +
  scale_size_manual(values=c(5,3,1))

# Vary the marker shape (plus a small size adjustment)
ggplot(dat, aes(x,y, colour=group, size=group, shape=group)) +
  geom_point(stroke=1.5) +
  scale_colour_manual(values=(c("black", "green", "orange"))) +
  scale_shape_manual(values=c(19,17,4)) +
  scale_size_manual(values=c(4,3,3))

enter image description here

like image 102
eipi10 Avatar answered Nov 09 '22 05:11

eipi10