Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

geom_point: Put overlapping points with highest values on top of others

I am visualizing a panel dataset with geom_point where y = var1, x = year, and color = var2. The problem is that there are many overlapping points, even with horizontal jitter.

Reducing the point size or setting a low alpha value is undesirable because both reduce the visual impact of the second variable, which has a very long right skew. I would like ggplot to place the points with the highest values of var2 on top of all other overlapping points.

Reproducible example:

df <- data.frame(diamonds)

ggplot(data = df,aes(x=factor(cut),y=carat,colour=price)) + 
  geom_point(position=position_jitter(width=.4))+
  scale_colour_gradientn(colours=c("grey20","orange","orange3"))

How does one place the points with highest values in df$price on top of an overlapping stack of points?

like image 555
metasequoia Avatar asked Aug 04 '12 02:08

metasequoia


People also ask

Which helps us see the distribution of data when scatterplot points are overlapping?

If you use a scatter plot for a dataset that has discrete values in one dimension, for example your x-axis shows the days of the week, you can get points overlapping when you plot the data. To make the chart easier to interpret you can introduce jitter to the data points.

How do I change the order of points in ggplot2?

In ggplot2, you can use the order aesthetic to specify the order in which points are plotted. The last ones plotted will appear on top. To apply this, you can create a variable holding the order in which you'd like points to be drawn.

How do you avoid overlapping points in ggplot2?

To avoid overlapping labels in ggplot2, we use guide_axis() within scale_x_discrete().

What is the difference between Geom_jitter and geom_point?

The jitter geom is a convenient shortcut for geom_point(position = "jitter") . It adds a small amount of random variation to the location of each point, and is a useful way of handling overplotting caused by discreteness in smaller datasets.


2 Answers

It looks as though grid plots in the order of the data,

library(grid)

d <- data.frame(x=c(0.5,0.52),y=c(0.6,0.6), fill=c("blue","red"),
                stringsAsFactors=FALSE)

grid.newpage()
with(d,grid.points(x,y,def='npc', pch=21,gp=gpar(cex=5, fill=fill)))
with(d[c(2,1),], grid.points(x,y-0.2,def='npc', pch=21,
                             gp=gpar(cex=5, fill=fill)))

so I would suggest you first reorder your data.frame, and pray that ggplot2 won't mess with it :)

library(ggplot2)
library(plyr)
df <- diamonds[order(diamonds$price, decreasing=TRUE), ]
# alternative with plyr
df <- arrange(diamonds, desc(price))
last_plot() %+% df
like image 127
baptiste Avatar answered Oct 21 '22 13:10

baptiste


In ggplot2, you can use the order aesthetic to specify the order in which points are plotted. The last ones plotted will appear on top. To apply this, create a variable holding the order in which you'd like points to be drawn; in your case you might be able to specify rank(var2).

For the reproducible example, to put the points with the highest df$price on top:

df <- data.frame(diamonds)
df$orderrank <- rank(df$price,ties.method="first")

ggplot(data = df,aes(x=factor(cut),y=carat,colour=price, order=orderrank)) + 
  geom_point(position=position_jitter(width=.4))+
  scale_colour_gradientn(colours=c("grey20","orange","orange3"))

Here is the difference in outputs between the example in the question and the chart with specified plotting order by price:

comparison of unordered and ordered plots

(The jittering makes the comparison a little less clear but the difference still comes across.)

like image 32
Sam Firke Avatar answered Oct 21 '22 13:10

Sam Firke