Understanding bandwidth smoothing in ggplot2

Tags:

ggplot2

realdata = https://www.dropbox.com/s/pc5tp2lfhafgaiy/realdata.txt

simulation = https://www.dropbox.com/s/5ep95808xg7bon3/simulation.txt

A density plot of this data using bandwidth=1.5 gives me the following plot:

prealdata = scan("realdata.txt")
simulation = scan("simulation.txt")
plot(density(log10(realdata), bw=1.5))
lines(density(log10(simulation), bw=1.5), lty=2)

enter image description here

But using ggplot2 to plot the same data, bandwidth argument (adjust) seems to be working differently. Why?

vec1 = data.frame(x=log10(realdata))
vec2 = data.frame(x=log10(simulation))
require(ggplot2)
ggplot() +
geom_density(aes(x=x, linetype="real data"), data=vec1, adjust=1.5) +
geom_density(aes(x=x, linetype="simulation"), data=vec2, adjust=1.5) +
scale_linetype_manual(name="data", values=c("real data"="solid", "simulation"="dashed"))

enter image description here

Suggestions on how to better smooth this data are also very welcome!

485

asked Jul 27 '14 20:07

1 Answers

adjust= is not the same as bw=. When you plot

plot(density(log10(realdata), bw=1.5))
lines(density(log10(simulation), bw=1.5), lty=2)

you get the same thing as ggplot

enter image description here

For whatever reason, ggplot does not allow you to specify a bw= parameter. By default, density uses bw.nrd0() so while you changed this for the plot using base graphics, you cannot change this value using ggplot. But what get's used is adjust*bw. So since we know how to calculate the default bw, we can recalculate adjust= to give use the same value.

#helper function
bw<-function(b, x) { b/bw.nrd0(x) }

require(ggplot2)
ggplot() +
geom_density(aes(x=x, linetype="real data"), data=vec1, adjust=bw(1.5, vec1$x)) +
geom_density(aes(x=x, linetype="simulation"), data=vec2, adjust=bw(1.5, vec2$x)) +
scale_linetype_manual(name="data", 
    values=c("real data"="solid", "simulation"="dashed"))

And that results in

enter image description here

which is the same as the base graphics plot.

answered Oct 02 '22 09:10

MrFlick

Related questions
                            
                                In R data.table multiplication by column name based on values of another column
                            
                                R Fast XML Parsing
                            
                                R - Basic understanding of using 'apply' instead of nested loop
                            
                                Unsplit a list of data frames after subsetting the data frames in the list
                            
                                Read CSV with variable rows to skip, bulk
                            
                                Adding rows in `dplyr` output
                            
                                Using C++ in R compling error: "RcppArmadillo.h: No such file or directory"
                            
                                Levenshtein type algorithm with numeric vectors
                            
                                plot linear regressions lines without interaction in ggplot2
                            
                                R data.table J behavior
                            
                                What is the syntax for using pandoc with RStudio Markdown for conversion to MS Word? How can it be used to include bibliography and inline citations?
                            
                                Knitr with gridSVG
                            
                                R script using X11 window only opens for a second
                            
                                Filter rows/documents from Document-Term-Matrix in R
                            
                                How to get a big sparse matrix in R? (> 2^31-1)
                            
                                Creating user image pattern, identicon, of stackoverflow.com
                            
                                Using knitr to write a paper for JSS
                            
                                Efficiently checking value of other row in data.table
                            
                                R Shiny - Output summary statistics
                            
                                ggplot2 multiple stat_smooth: change color & linetype

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding bandwidth smoothing in ggplot2

Tags:

r

ggplot2

vitor

People also ask

1 Answers

MrFlick

Recent Activity

Donate For Us