I am visualizing a panel dataset with geom_point
where y = var1
, x = year
, and color = var2
. The problem is that there are many overlapping points, even with horizontal jitter.
Reducing the point size or setting a low alpha value is undesirable because both reduce the visual impact of the second variable, which has a very long right skew. I would like ggplot to place the points with the highest values of var2
on top of all other overlapping points.
Reproducible example:
df <- data.frame(diamonds)
ggplot(data = df,aes(x=factor(cut),y=carat,colour=price)) +
geom_point(position=position_jitter(width=.4))+
scale_colour_gradientn(colours=c("grey20","orange","orange3"))
How does one place the points with highest values in df$price
on top of an overlapping stack of points?
If you use a scatter plot for a dataset that has discrete values in one dimension, for example your x-axis shows the days of the week, you can get points overlapping when you plot the data. To make the chart easier to interpret you can introduce jitter to the data points.
In ggplot2, you can use the order aesthetic to specify the order in which points are plotted. The last ones plotted will appear on top. To apply this, you can create a variable holding the order in which you'd like points to be drawn.
To avoid overlapping labels in ggplot2, we use guide_axis() within scale_x_discrete().
The jitter geom is a convenient shortcut for geom_point(position = "jitter") . It adds a small amount of random variation to the location of each point, and is a useful way of handling overplotting caused by discreteness in smaller datasets.
It looks as though grid plots in the order of the data,
library(grid)
d <- data.frame(x=c(0.5,0.52),y=c(0.6,0.6), fill=c("blue","red"),
stringsAsFactors=FALSE)
grid.newpage()
with(d,grid.points(x,y,def='npc', pch=21,gp=gpar(cex=5, fill=fill)))
with(d[c(2,1),], grid.points(x,y-0.2,def='npc', pch=21,
gp=gpar(cex=5, fill=fill)))
so I would suggest you first reorder your data.frame, and pray that ggplot2 won't mess with it :)
library(ggplot2)
library(plyr)
df <- diamonds[order(diamonds$price, decreasing=TRUE), ]
# alternative with plyr
df <- arrange(diamonds, desc(price))
last_plot() %+% df
In ggplot2, you can use the order aesthetic to specify the order in which points are plotted. The last ones plotted will appear on top. To apply this, create a variable holding the order in which you'd like points to be drawn; in your case you might be able to specify rank(var2)
.
For the reproducible example, to put the points with the highest df$price
on top:
df <- data.frame(diamonds)
df$orderrank <- rank(df$price,ties.method="first")
ggplot(data = df,aes(x=factor(cut),y=carat,colour=price, order=orderrank)) +
geom_point(position=position_jitter(width=.4))+
scale_colour_gradientn(colours=c("grey20","orange","orange3"))
Here is the difference in outputs between the example in the question and the chart with specified plotting order by price:
(The jittering makes the comparison a little less clear but the difference still comes across.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With