Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Piecewise regression with R: plotting the segments

Tags:

I have 54 points. They represent offer and demand for products. I would like to show there is a break point in the offer.

First, I sort the x-axis (offer) and remove the values that appears twice. I have 47 values, but I remove the first and last ones (doesn't make sense to consider them as break points). Break is of length 45:

Break<-(sort(unique(offer))[2:46]) 

Then, for each of these potential break points, I estimate a model and I keep in "d" the residual standard error (sixth element in model summary object).

d<-numeric(45) for (i in 1:45) { model<-lm(demand~(offer<Break[i])*offer + (offer>=Break[i])*offer) d[i]<-summary(model)[[6]] } 

Plotting d, I notice that my smaller residual standard error is 34, that corresponds to "Break[34]": 22.4. So I write my model with my final break point:

model<-lm(demand~(offer<22.4)*offer + (offer>=22.4)*offer) 

Finally, I'm happy with my new model. It's significantly better than the simple linear one. And I want to draw it:

plot(demand~offer) i <- order(offer) lines(offer[i], predict(model,list(offer))[i]) 

But I have a warning message:

Warning message: In predict.lm(model, list(offer)) :   prediction from a rank-deficient fit may be misleading 

And more important, the lines are really strange on my plot.

My plot with the supposedly two segments, but not joining

Here are my data:

demand <- c(1155, 362, 357, 111, 703, 494, 410, 63, 616, 468, 973, 235,             180, 69, 305, 106, 155, 422, 44, 1008, 225, 321, 1001, 531, 143,             251, 216, 57, 146, 226, 169, 32, 75, 102, 4, 68, 102, 462, 295,             196, 50, 739, 287, 226, 706, 127, 85, 234, 153, 4, 373, 54, 81,             18) offer <- c(39.3, 23.5, 22.4, 6.1, 35.9, 35.5, 23.2, 9.1, 27.5, 28.6, 41.3,            16.9, 18.2, 9, 28.6, 12.7, 11.8, 27.9, 21.6, 45.9, 11.4, 16.6,            40.7, 22.4, 17.4, 14.3, 14.6, 6.6, 10.6, 14.3, 3.4, 5.1, 4.1,            4.1, 1.7, 7.5, 7.8, 22.6, 8.6, 7.7, 7.8, 34.7, 15.6, 18.5, 35,            16.5, 11.3, 7.7, 14.8, 2, 12.4, 9.2, 11.8, 3.9) 
like image 573
Antonin Avatar asked Jan 06 '12 13:01

Antonin


People also ask

What is a segmented regression model?

Segmented regression analysis is a method for statistically modelling the interrupted time series data to draw more formal conclusions about the impact of an intervention or event on the measure of interest. Two parameters define each segment of a time series: level and trend.

What is a piecewise linear regression model?

Piecewise linear regression is a form of regression that allows multiple linear models to be fitted to the data for different ranges of X. The regression function at the breakpoint may be discontinuous, but it is possible to specify the model such that the model is continuous at all points.


1 Answers

Here is an easier approach using ggplot2.

require(ggplot2) qplot(offer, demand, group = offer > 22.4, geom = c('point', 'smooth'),     method = 'lm', se = F, data = dat) 

EDIT. I would also recommend taking a look at this package segmented which supports automatic detection and estimation of segmented regression models.

enter image description here

UPDATE:

Here is an example that makes use of the R package segmented to automatically detect the breaks

library(segmented) set.seed(12) xx <- 1:100 zz <- runif(100) yy <- 2 + 1.5*pmax(xx - 35, 0) - 1.5*pmax(xx - 70, 0) + 15*pmax(zz - .5, 0) +    rnorm(100,0,2) dati <- data.frame(x = xx, y = yy, z = zz) out.lm <- lm(y ~ x, data = dati) o <- segmented(out.lm, seg.Z = ~x, psi = list(x = c(30,60)),   control = seg.control(display = FALSE) ) dat2 = data.frame(x = xx, y = broken.line(o)$fit)  library(ggplot2) ggplot(dati, aes(x = x, y = y)) +   geom_point() +   geom_line(data = dat2, color = 'blue') 

segmented

like image 144
Ramnath Avatar answered Nov 07 '22 18:11

Ramnath