I have 54 points. They represent offer and demand for products. I would like to show there is a break point in the offer.
First, I sort the x-axis (offer) and remove the values that appears twice. I have 47 values, but I remove the first and last ones (doesn't make sense to consider them as break points). Break is of length 45:
Break<-(sort(unique(offer))[2:46])
Then, for each of these potential break points, I estimate a model and I keep in "d" the residual standard error (sixth element in model summary object).
d<-numeric(45) for (i in 1:45) { model<-lm(demand~(offer<Break[i])*offer + (offer>=Break[i])*offer) d[i]<-summary(model)[[6]] }
Plotting d, I notice that my smaller residual standard error is 34, that corresponds to "Break[34]": 22.4. So I write my model with my final break point:
model<-lm(demand~(offer<22.4)*offer + (offer>=22.4)*offer)
Finally, I'm happy with my new model. It's significantly better than the simple linear one. And I want to draw it:
plot(demand~offer) i <- order(offer) lines(offer[i], predict(model,list(offer))[i])
But I have a warning message:
Warning message: In predict.lm(model, list(offer)) : prediction from a rank-deficient fit may be misleading
And more important, the lines are really strange on my plot.
Here are my data:
demand <- c(1155, 362, 357, 111, 703, 494, 410, 63, 616, 468, 973, 235, 180, 69, 305, 106, 155, 422, 44, 1008, 225, 321, 1001, 531, 143, 251, 216, 57, 146, 226, 169, 32, 75, 102, 4, 68, 102, 462, 295, 196, 50, 739, 287, 226, 706, 127, 85, 234, 153, 4, 373, 54, 81, 18) offer <- c(39.3, 23.5, 22.4, 6.1, 35.9, 35.5, 23.2, 9.1, 27.5, 28.6, 41.3, 16.9, 18.2, 9, 28.6, 12.7, 11.8, 27.9, 21.6, 45.9, 11.4, 16.6, 40.7, 22.4, 17.4, 14.3, 14.6, 6.6, 10.6, 14.3, 3.4, 5.1, 4.1, 4.1, 1.7, 7.5, 7.8, 22.6, 8.6, 7.7, 7.8, 34.7, 15.6, 18.5, 35, 16.5, 11.3, 7.7, 14.8, 2, 12.4, 9.2, 11.8, 3.9)
Segmented regression analysis is a method for statistically modelling the interrupted time series data to draw more formal conclusions about the impact of an intervention or event on the measure of interest. Two parameters define each segment of a time series: level and trend.
Piecewise linear regression is a form of regression that allows multiple linear models to be fitted to the data for different ranges of X. The regression function at the breakpoint may be discontinuous, but it is possible to specify the model such that the model is continuous at all points.
Here is an easier approach using ggplot2
.
require(ggplot2) qplot(offer, demand, group = offer > 22.4, geom = c('point', 'smooth'), method = 'lm', se = F, data = dat)
EDIT. I would also recommend taking a look at this package segmented
which supports automatic detection and estimation of segmented regression models.
UPDATE:
Here is an example that makes use of the R package segmented to automatically detect the breaks
library(segmented) set.seed(12) xx <- 1:100 zz <- runif(100) yy <- 2 + 1.5*pmax(xx - 35, 0) - 1.5*pmax(xx - 70, 0) + 15*pmax(zz - .5, 0) + rnorm(100,0,2) dati <- data.frame(x = xx, y = yy, z = zz) out.lm <- lm(y ~ x, data = dati) o <- segmented(out.lm, seg.Z = ~x, psi = list(x = c(30,60)), control = seg.control(display = FALSE) ) dat2 = data.frame(x = xx, y = broken.line(o)$fit) library(ggplot2) ggplot(dati, aes(x = x, y = y)) + geom_point() + geom_line(data = dat2, color = 'blue')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With