Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Forecasting time series data

Tags:

r

time-series

xts

I've done some research and I am stuck in finding the solution. I have a time series data, very basic data frame, let's call it x:

Date        Used 11/1/2011   587 11/2/2011   578 11/3/2011   600 11/4/2011   599 11/5/2011   678 11/6/2011   555 11/7/2011   650 11/8/2011   700 11/9/2011   600 11/10/2011  550 11/11/2011  600 11/12/2011  610 11/13/2011  590 11/14/2011  595 11/15/2011  601 11/16/2011  700 11/17/2011  650 11/18/2011  620 11/19/2011  645 11/20/2011  650 11/21/2011  639 11/22/2011  620 11/23/2011  600 11/24/2011  550 11/25/2011  600 11/26/2011  610 11/27/2011  590 11/28/2011  595 11/29/2011  601 11/30/2011  700 12/1/2011   650 12/2/2011   620 12/3/2011   645 12/4/2011   650 12/5/2011   639 12/6/2011   620 12/7/2011   600 12/8/2011   550 12/9/2011   600 12/10/2011  610 12/11/2011  590 12/12/2011  595 12/13/2011  601 12/14/2011  700 12/15/2011  650 12/16/2011  620 12/17/2011  645 12/18/2011  650 12/19/2011  639 12/20/2011  620 12/21/2011  600 12/22/2011  550 12/23/2011  600 12/24/2011  610 12/25/2011  590 12/26/2011  750 12/27/2011  750 12/28/2011  666 12/29/2011  678 12/30/2011  800 12/31/2011  750 

I really appreciate any help with this. I am working with time series data and need to be able to create forecast based on historical data.

  1. First I tried to convert it to xts:

    x.xts <- xts(x$Used, x$Date) 
  2. Then, I converted x.xts to regular time series:

    x.ts <- as.ts(x.xts) 
  3. Put the values in ets:

    x.ets <- ets(x.ts) 
  4. Performed forecasting for 10 periods:

    x.fore <- forecast(x.ets, h=10) 
  5. x.fore is this:

       Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95 87       932.9199 831.7766 1034.063 778.2346 1087.605 88       932.9199 818.1745 1047.665 757.4319 1108.408 89       932.9199 805.9985 1059.841 738.8103 1127.029 90       932.9199 794.8706 1070.969 721.7918 1144.048 91       932.9199 784.5550 1081.285 706.0153 1159.824 92       932.9199 774.8922 1090.948 691.2375 1174.602 93       932.9199 765.7692 1100.071 677.2849 1188.555 94       932.9199 757.1017 1108.738 664.0292 1201.811 95       932.9199 748.8254 1117.014 651.3717 1214.468 96       932.9199 740.8897 1124.950 639.2351 1226.605 
  6. When I try to plot the x.fore, I get a graph but the x-axis is showing numbers rather than dates:

enter image description here

Are the steps I am doing correct? How can I change the x-axis to read show dates?

I thank you so much for any input.

like image 940
george willy Avatar asked Apr 24 '12 16:04

george willy


People also ask

What are the main forecasting techniques for time series data?

Types of time series methods used for forecasting Common types include: Autoregression (AR), Moving Average (MA), Autoregressive Moving Average (ARMA), Autoregressive Integrated Moving Average (ARIMA), and Seasonal Autoregressive Integrated Moving-Average (SARIMA).

What is forecasting in time series?

Time series forecasting is a technique for the prediction of events through a sequence of time. The technique is used across many fields of study, from the geology to behavior to economics.

How do you forecast time series data in Excel?

To create a forecast sheet, first make sure you have your time-based series data set ready (it should have a time series and values series). Next, under the Data tab, click the Forecast sheet button. This launches the forecast dialog that walks you through the process.

What is data forecasting?

A forecast is a prediction made by studying historical data and past patterns. Businesses use software tools and systems to analyze large amounts of data collected over a long period.


2 Answers

Here's what I did:

x$Date = as.Date(x$Date,format="%m/%d/%Y") x = xts(x=x$Used, order.by=x$Date) # To get the start date (305) #     > as.POSIXlt(x = "2011-11-01", origin="2011-11-01")$yday ##    [1] 304 # Add one since that starts at "0" x.ts = ts(x, freq=365, start=c(2011, 305)) plot(forecast(ets(x.ts), 10)) 

Resulting in:

Example output

What can we learn from this:

  • Many of your steps can be combined reducing the number of intermediate objects you create
  • The output is still not as pretty as @joran, but it is still easily readable. 2011.85 means "day number 365*.85" (day 310 in the year).
  • Figuring out the day in a year can be done by using as.POSIXlt(x = "2011-11-01", origin="2011-11-01")$yday and figuring out the date from a day number can be done by using something like as.Date(310, origin="2011-01-01")

Update

You can drop even more intermediate steps, since there's no reason to first convert your data into an xts.

x = ts(x$Used, start=c(2011, as.POSIXlt("2011-11-01")$yday+1), frequency=365) # NOTE: We have only selected the "Used" variable  # since ts will take care of dates plot(forecast(ets(x), 10)) 

This gives exactly the same result as the image above.

Update 2

Building on the solution provided by @joran, you can try:

# 'start' calculation = `as.Date("2011-11-01")-as.Date("2011-01-01")+1` # No need to convert anything to dates at this point using xts x = ts(x$Used, start=c(2011, 305), frequency=365) # Directly plot your forecast without your axes plot(forecast(ets(x), 10), axes = FALSE) # Generate labels for your x-axis a = seq(as.Date("2011-11-01"), by="weeks", length=11) # Plot your axes. # `at` is an approximation--there's probably a better way to do this,  # but the logic is approximately 365.25 days in a year, and an origin # date in R of `January 1, 1970` axis(1, at = as.numeric(a)/365.25+1970, labels = a, cex.axis=0.6) axis(2, cex.axis=0.6) 

Which will yield:

Second attempt

Part of the problem in your original code is that after you have converted your data to an xts object, and converted that to a ts object, you lose the dates in your forecast points.

Compare the first column (Point) of your x.fore output to the following:

> forecast(ets(x), 10)          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95 2012.000       741.6437 681.7991 801.4884 650.1192 833.1682 2012.003       741.6437 676.1250 807.1624 641.4415 841.8459 2012.005       741.6437 670.9047 812.3828 633.4577 849.8298 2012.008       741.6437 666.0439 817.2435 626.0238 857.2637 2012.011       741.6437 661.4774 821.8101 619.0398 864.2476 2012.014       741.6437 657.1573 826.1302 612.4328 870.8547 2012.016       741.6437 653.0476 830.2399 606.1476 877.1399 2012.019       741.6437 649.1202 834.1672 600.1413 883.1462 2012.022       741.6437 645.3530 837.9345 594.3797 888.9078 2012.025       741.6437 641.7276 841.5599 588.8352 894.4523 

Hopefully this helps you understand the problem with your original approach and improves your capacity with dealing with time series in R.

Update 3

Final, and more accurate solution--because I'm avoiding other work that I should actually be doing right now...

Use the lubridate package for better date handling:

require(lubridate) y = ts(x$Used, start=c(2011, yday("2011-11-01")), frequency=365) plot(forecast(ets(y), 10), xaxt="n") a = seq(as.Date("2011-11-01"), by="weeks", length=11) axis(1, at = decimal_date(a), labels = format(a, "%Y %b %d"), cex.axis=0.6) abline(v = decimal_date(a), col='grey', lwd=0.5) 

Resulting in:

Final plot

Note the alternative method of identifying the start date for your ts object.

like image 134
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 01 '22 05:10

A5C1D2H2I1M1N2O1R2T1


If you don't have any preferences over a specific model, I suggest you to use one that applies to a big range of situations:

library(forecast) t.ser <- ts(used, start=c(2011,1), freq=12) t.ets <- ets(t.ser) t.fc <- forecast(t.ets,h=10) 

This will give you the prediction for the next 10 months.

Being more technical, it uses Exponential Smoothing method that is a good choice for general situations. Depending on the kind of the data, there might be a better model specific to your use, but ets is a good general choice.

It's important to highlight that since you don't have two periods completed (less than 24 months), the model cannot detect sazonality, and therefore this won't be included on calculations.

like image 43
João Daniel Avatar answered Oct 01 '22 05:10

João Daniel