Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add time series objects (ts) in a data.table, by row?

I'm trying to store ts objects by row. The monthly data (24 monthly values for 1980 and 1981) for creating the time series are stored in a row-wise order in DT, so I just want to add a column in DT to store the "ts" objects () for each row. Here's a reproducible example where I tried three different options but none of them seems to work as I expected.

library(data.table)
DT <- data.table(ID=seq(1:10),
                 JAN_1980=rnorm(1:10),FEB_1980=rnorm(1:10),MAR_1980=rnorm(1:10),APR_1980=rnorm(1:10),MAY_1980=rnorm(1:10),JUN_1980=rnorm(1:10),JUL_1980=rnorm(1:10),AUG_1980=rnorm(1:10),SEP_1980=rnorm(1:10),OCT_1980=rnorm(1:10),NOV_1980=rnorm(1:10),DEC_1980=rnorm(1:10),JAN_1981=rnorm(1:10),FEB_1981=rnorm(1:10),MAR_1981=rnorm(1:10),APR_1981=rnorm(1:10),MAY_1981=rnorm(1:10),JUN_1981=rnorm(1:10),JUL_1981=rnorm(1:10),AUG_1981=rnorm(1:10),SEP_1981=rnorm(1:10),OCT_1981=rnorm(1:10),NOV_1981=rnorm(1:10),DEC_1981=rnorm(1:10))

# First attempt
DT[,TS_COL:=ts(.SD[,2:25,with=FALSE], start=c(1980,1), frequency=12)]

# Second
DT[,TS_COL:=ts(unlist(.SD[,2:25,with=FALSE]), start=c(1980,1), frequency=12)]

# Third
DT[,TS_COL:=list(list(list(ts(unlist(.SD[,2:25,with=FALSE]), start=c(1980,1), frequency=12))))]

I'd like to be able to access a ts object for a specific row in this way (no luck yet):

DT[1,TS_COL]

...and get something like (2 years of monthly data):

             Jan         Feb         Mar         Apr         May         Jun         Jul         Aug         Sep         Oct         Nov         Dec
1980  2.13303849  0.74954206 -0.45112504  2.13558888  1.11883498 -0.39074470  1.77374480 -0.19513901  0.49920019 -1.12875185  0.45598049  1.97730211
1981  0.62764761 -0.86330094 -0.51585664  0.59677770 -0.71073980 -0.26208961 -0.38833227  1.39841244 -1.50490225 -0.72018921  1.06684672  0.07126184

Any hint on how to achieve this?

like image 792
Guillermo E Ponce-Campos Avatar asked Jan 07 '16 20:01

Guillermo E Ponce-Campos


People also ask

Which R function can be used to create time series objects?

The function ts is used to create time-series objects.

What is a time series object in R?

A time series can be thought of as a vector or matrix of numbers along with some information about what times those numbers were recorded. This information is stored in a ts object in R. In most exercises, you will use time series that are part of existing packages.

What is MTS class in R?

You can create a multiple time series (“mts”) object, by feeding in a matrix to the ts function.


1 Answers

I can't remember ever using ts() myself. I tend to have irregular time series stored long format. Either a single datetime column, or a date column and time column separately (for rolling to the prevailing observation within a day but not to the previous day). Then I create a regularly spaced time series and join that to the data, or find the beginning and end of windows using which and roll and extract the subset for that window.

That said, let's try with ts().

Please include the error or warning message in your question. See items 6 and 7 on the Support page. Your example is not reproducible; i.e. I get the following warnings but it's feasible you're getting a different warning (you didn't include it, so there's nothing to attempt to reproduce). Neither is the example minimal because we don't need 20 columns that wrap around the console output.

DT[,TS_COL:=ts(.SD[,2:25,with=FALSE], start=c(1980,1), frequency=12)]
# Warning messages:
# 1: In `[.data.table`(DT, , `:=`(TS_COL, ts(.SD[, 2:25, with = FALSE],  :
# 24 column matrix RHS of := will be treated as one vector
# 2: In `[.data.table`(DT, , `:=`(TS_COL, ts(.SD[, 2:25, with = FALSE],  :
#  Supplied 240 items to be assigned to 10 items of column 'TS_COL' (230 unused)

First things first, let's look at the manual. ?ts contains the following signature:

ts(data = NA, start = 1, end = numeric(), frequency = 1, deltat = 1, ts.eps = getOption("ts.eps"), class = , names = )

You're using the first argument data so under that it says :

data: a vector or matrix of the observed time-series values. A data frame will be coerced to a numeric matrix via data.matrix. (See also ‘Details’.)

Since a data.table inherits from data.frame, it is a data.frame too. Therefore the data.table will be coerced to a matrix.

Further down, we see something about matrix :

In the matrix case, each column of the matrix data is assumed to contain a single (univariate) time series.

Now let's split the problem up and inspect the RHS it's trying to assign. Simply remove the TS_COL:= part and run it again to return the RHS so we can have a look at it.

RHS = DT[,ts(.SD[,2:25,with=FALSE], start=c(1980,1), frequency=12)]
class(RHS)
# [1] "mts"    "ts"     "matrix"
dim(RHS)
# [1] 10 24
dim(DT)
# [1] 10 26
length(RHS)
# [1] 240
storage.mode(RHS)
# [1] "double"

So it's a matrix. And worse it is double and not integer. (Recall we don't like Date in base either for use in data.table because, oddly, Date is double rather than integer.)

You can't store a matrix as a column in data.table. data.table treats the matrix as the vector it is internally, which the warning messages (shown above in this answer) are alluding to. Here are the warning messages again :

24 column matrix RHS of := will be treated as one vector
Supplied 240 items to be assigned to 10 items of column 'TS_COL' (230 unused)

These warnings are created by data.table code and are pretty good I think.

So if you are to proceed with using the ts() class as a column of data.table then you need to either coerce the matrix to a list of 24 columns (24 vectors all 10 long) rather than a matrix of 24 columns (internally one vector 240 long).

But at this point it seems the ts() class is not the right tool for the job. What do you really need to do? Better to back up and describe what the bigger picture is.

like image 124
Matt Dowle Avatar answered Oct 07 '22 01:10

Matt Dowle