realdata = https://www.dropbox.com/s/pc5tp2lfhafgaiy/realdata.txt
simulation = https://www.dropbox.com/s/5ep95808xg7bon3/simulation.txt
A density plot of this data using bandwidth=1.5 gives me the following plot:
prealdata = scan("realdata.txt")
simulation = scan("simulation.txt")
plot(density(log10(realdata), bw=1.5))
lines(density(log10(simulation), bw=1.5), lty=2)
But using ggplot2 to plot the same data, bandwidth argument (adjust) seems to be working differently. Why?
vec1 = data.frame(x=log10(realdata))
vec2 = data.frame(x=log10(simulation))
require(ggplot2)
ggplot() +
geom_density(aes(x=x, linetype="real data"), data=vec1, adjust=1.5) +
geom_density(aes(x=x, linetype="simulation"), data=vec2, adjust=1.5) +
scale_linetype_manual(name="data", values=c("real data"="solid", "simulation"="dashed"))
Suggestions on how to better smooth this data are also very welcome!
The function geom_point() adds a layer of points to your plot, which creates a scatterplot. ggplot2 comes with many geom functions that each add a different type of layer to a plot.
geom_density.Rd. Computes and draws kernel density estimate, which is a smoothed version of the histogram. This is a useful alternative to the histogram for continuous data that comes from an underlying smooth distribution.
The kernel density plot is a non-parametric approach that needs a bandwidth to be chosen. You can set the bandwidth with the bw argument of the density function. In general, a big bandwidth will oversmooth the density curve, and a small one will undersmooth (overfit) the kernel density estimation in R.
Data Visualization using GGPlot2. A density plot is an alternative to Histogram used for visualizing the distribution of a continuous variable. The peaks of a Density Plot help to identify where values are concentrated over the interval of the continuous variable.
adjust=
is not the same as bw=
. When you plot
plot(density(log10(realdata), bw=1.5))
lines(density(log10(simulation), bw=1.5), lty=2)
you get the same thing as ggplot
For whatever reason, ggplot
does not allow you to specify a bw=
parameter. By default, density
uses bw.nrd0()
so while you changed this for the plot using base graphics, you cannot change this value using ggplot
. But what get's used is adjust*bw
. So since we know how to calculate the default bw
, we can recalculate adjust=
to give use the same value.
#helper function
bw<-function(b, x) { b/bw.nrd0(x) }
require(ggplot2)
ggplot() +
geom_density(aes(x=x, linetype="real data"), data=vec1, adjust=bw(1.5, vec1$x)) +
geom_density(aes(x=x, linetype="simulation"), data=vec2, adjust=bw(1.5, vec2$x)) +
scale_linetype_manual(name="data",
values=c("real data"="solid", "simulation"="dashed"))
And that results in
which is the same as the base graphics plot.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With