Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Smooth a binary variable using moving average or kernel smoothing

Tags:

r

smoothing

I have data of the form:

x      y
0      0
0.01   1
0.03   0
0.04   1
0.04   0

x is continuous from 0 to 1 and not equally spaced and y is binary.

I'd like to smooth y over the x-axis using R, but can't find the right package. The kernel smoothing functions I've found produce density estimates of x or will give the wrong estimate at the ends of the x because they'll average over regions less than 0 and greater than 1.

I'd also like to avoid linear smoothers like Loess givens then binary form of y. The moving average functions I've seen assume equally-spaced x-values.

Do you know of any R functions that will smooth and ideally have a bandwidth selection procedure? I can write a moving average function and cross-validate to determine the bandwidth, but I'd prefer to find an existing function that's been vetted.

like image 895
user1910316 Avatar asked Dec 17 '12 15:12

user1910316


1 Answers

I would suggest using something like

d <- data.frame(x,y) ## not absolutely necessary but good practice
library(mgcv)
m1 <- gam(y~s(x),family="binomial",data=d)

This will (1) respect the binary nature of the data (2) do automatic degree-of-smoothness ("bandwidth" in your terminology) selection, using generalized cross-validation.

Use

plot(y~x, data=d)
pp <- data.frame(x=seq(0,1,length=101))
pp$y <- predict(m1,newdata=pp,type="response")
with(pp,lines(x,y))

or

library(ggplot2)
ggplot(d,aes(x,y))+geom_smooth(method="gam",family=binomial)

to get predictions/plot the results.

(I hope your real data set has more than 5 observations ... otherwise this will fail ...)

like image 193
Ben Bolker Avatar answered Sep 22 '22 22:09

Ben Bolker