This is not a completely responsive answer but it is very simple. It illustrates an alternate method to display marginal densities and also how to use alpha levels for graphical output that supports transparency:
scatter <- qplot(x,y, data=xy) +
scale_x_continuous(limits=c(min(x),max(x))) +
scale_y_continuous(limits=c(min(y),max(y))) +
geom_rug(col=rgb(.5,0,0,alpha=.2))
scatter
This might be a bit late, but I decided to make a package (ggExtra
) for this since it involved a bit of code and can be tedious to write. The package also tries to address some common issue such as ensuring that even if there is a title or the text is enlarged, the plots will still be inline with one another.
The basic idea is similar to what the answers here gave, but it goes a bit beyond that. Here is an example of how to add marginal histograms to a random set of 1000 points. Hopefully this makes it easier to add histograms/density plots in the future.
Link to ggExtra package
library(ggplot2)
df <- data.frame(x = rnorm(1000, 50, 10), y = rnorm(1000, 50, 10))
p <- ggplot(df, aes(x, y)) + geom_point() + theme_classic()
ggExtra::ggMarginal(p, type = "histogram")
The gridExtra
package should work here. Start by making each of the ggplot objects:
hist_top <- ggplot()+geom_histogram(aes(rnorm(100)))
empty <- ggplot()+geom_point(aes(1,1), colour="white")+
theme(axis.ticks=element_blank(),
panel.background=element_blank(),
axis.text.x=element_blank(), axis.text.y=element_blank(),
axis.title.x=element_blank(), axis.title.y=element_blank())
scatter <- ggplot()+geom_point(aes(rnorm(100), rnorm(100)))
hist_right <- ggplot()+geom_histogram(aes(rnorm(100)))+coord_flip()
Then use the grid.arrange function:
grid.arrange(hist_top, empty, scatter, hist_right, ncol=2, nrow=2, widths=c(4, 1), heights=c(1, 4))
One addition, just to save some searching time for people doing this after us.
Legends, axis labels, axis texts, ticks make the plots drifted away from each other, so your plot will look ugly and inconsistent.
You can correct this by using some of these theme settings,
+theme(legend.position = "none",
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
plot.margin = unit(c(3,-5.5,4,3), "mm"))
and align scales,
+scale_x_continuous(breaks = 0:6,
limits = c(0,6),
expand = c(.05,.05))
so the results will look OK:
Just a very minor variation on BondedDust's answer, in the general spirit of marginal indicators of distribution.
Edward Tufte has called this use of rug plots a 'dot-dash plot', and has an example in VDQI of using the axis lines to indicate the range of each variable. In my example the axis labels and grid lines also indicate the distribution of the data. The labels are located at the values of Tukey's five number summary (minimum, lower-hinge, median, upper-hinge, maximum), giving a quick impression of the spread of each variable.
These five numbers are thus a numerical representation of a boxplot. It's a bit tricky because the unevenly spaced grid-lines suggest that the axes have a non-linear scale (in this example they are linear). Perhaps it would be best to omit grid lines or force them to be in regular locations, and just let the labels show the five number summary.
x<-rnorm(300)
y<-rt(300,df=10)
xy<-data.frame(x,y)
require(ggplot2); require(grid)
# make the basic plot object
ggplot(xy, aes(x, y)) +
# set the locations of the x-axis labels as Tukey's five numbers
scale_x_continuous(limit=c(min(x), max(x)),
breaks=round(fivenum(x),1)) +
# ditto for y-axis labels
scale_y_continuous(limit=c(min(y), max(y)),
breaks=round(fivenum(y),1)) +
# specify points
geom_point() +
# specify that we want the rug plot
geom_rug(size=0.1) +
# improve the data/ink ratio
theme_set(theme_minimal(base_size = 18))
I tried those options, but wasn't satisfied by the results or the messy code one would need to use to get there. Lucky me, Thomas Lin Pedersen just developed a package called patchwork, which gets the job done in a pretty elegant manner.
If you want to create a scatterplot with marginal histograms, first you'd have to create those three plots seperately.
library(ggplot2)
x <- rnorm(300)
y <- rt(300, df = 2)
xy <- data.frame(x, y)
plot1 <- ggplot(xy, aes(x = x, y = y)) +
geom_point()
dens1 <- ggplot(xy, aes(x = x)) +
geom_histogram(color = "black", fill = "white") +
theme_void()
dens2 <- ggplot(xy, aes(x = y)) +
geom_histogram(color = "black", fill = "white") +
theme_void() +
coord_flip()
The only thing left to do, is to add those plots with a simple +
and specify the layout with the function plot_layout()
.
library(patchwork)
dens1 + plot_spacer() + plot1 + dens2 +
plot_layout(
ncol = 2,
nrow = 2,
widths = c(4, 1),
heights = c(1, 4)
)
The function plot_spacer()
adds an empty plot to the top right corner. All the other arguments should be self-explanatory.
Since histograms heavily depend on the chosen binwidth, one might argue to prefer density plots. With some small modifications one would get e.g. for eye tracking data a beautiful plot.
library(ggpubr)
plot1 <- ggplot(df, aes(x = Density, y = Face_sum, color = Group)) +
geom_point(aes(color = Group), size = 3) +
geom_point(shape = 1, color = "black", size = 3) +
stat_smooth(method = "lm", fullrange = TRUE) +
geom_rug() +
scale_y_continuous(name = "Number of fixated faces",
limits = c(0, 205), expand = c(0, 0)) +
scale_x_continuous(name = "Population density (lg10)",
limits = c(1, 4), expand = c(0, 0)) +
theme_pubr() +
theme(legend.position = c(0.15, 0.9))
dens1 <- ggplot(df, aes(x = Density, fill = Group)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none")
dens2 <- ggplot(df, aes(x = Face_sum, fill = Group)) +
geom_density(alpha = 0.4) +
theme_void() +
theme(legend.position = "none") +
coord_flip()
dens1 + plot_spacer() + plot1 + dens2 +
plot_layout(ncol = 2, nrow = 2, widths = c(4, 1), heights = c(1, 4))
Though the data is not provided at this point, the underlying principles should be clear.
As there was no satisfying solution for this kind of plot when comparing different groups, I wrote a function to do this.
It works for both grouped and ungrouped data and accepts additional graphical parameters:
marginal_plot(x = iris$Sepal.Width, y = iris$Sepal.Length)
marginal_plot(x = Sepal.Width, y = Sepal.Length, group = Species, data = iris, bw = "nrd", lm_formula = NULL, xlab = "Sepal width", ylab = "Sepal length", pch = 15, cex = 0.5)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With