Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating Hierachical-Data Structure, Nodes in HTS R

I am trying to create the node structure utilizing the HTS package in R. The documentation regarding nodes is sparse so trying to code the node structure appropriately is difficult and to add an added layer I am trying to create two hierarchies in which we have we have the following:

(Hierarchy 1 - Geography: example is US state Delaware and its counties)

=> 10000
    => 10001 
    => 10003          
    => 10005
    => 10999

(Hierarchy 2 - Industry: simplified)

=> 10
     => 11
     => 12 
     => 21 
     => 22 
     => 31
     ...
     => 99

Edit 2 - Corrected hierarchies and further clarification

So each timeseries will have a geography code and an industry code. The geography codes follow one hierarchy and the industry codes another (shown above).

I'm trying to figure out how to specify the "nodes" argument to represent the relationships of both hierarchies (the documentation example only shows a single hierarchy).

When the two hierarchies interact, we get additional levels. Let's simplify by assuming there are only 2 industries, 11 and 12. The timeseries identified by (10001,11) and (10001,12) must add up to (10001,10); and also, (10001,11)...(10999,11) must add up to (10000,11), etc etc. Again, these are simplified hierarchies - in the real data there are more levels.

The question is, how does the "nodes" argument look like for two hierarchies? Hope this helps.

like image 769
j riot Avatar asked Jun 12 '14 18:06

j riot


1 Answers

Your notation (which may not be your choice) is making this very confusing. It seems like the same numerical sequence can refer to either a county or an industry.

However, the basic idea is clear enough: you have two hierarchies and you want both types of aggregation to be taking into account. Here is an example using my own notation to make it clearer.

Suppose there are two states with four and five counties respectively, and two industries with three and two sub-industries respectively. So there are 9x5 series at the most disaggregated level (sub-industry x county combinations). I will call the states A and B, and the counties A1,A2,A3,A4 and B1,B2,B3,B4,B5. I will call the industries X and Y with sub-industries Xa,Xb,Xc and Ya,Yb respectively. Suppose you have the bottom level series (the most disaggregated level) in a matrix y, with one column per series, and columns in the following order:

 County A1, industry Xa
 County A1, industry Xb
 County A1, industry Xc
 County A1, industry Ya
 County A1, industry Yb
 County A2, industry Xa
 County A2, industry Xb
 County A2, industry Xc
 County A2, industry Ya
 County A2, industry Yb
...
 County B5, industry Xa
 County B5, industry Xb
 County B5, industry Xc
 County B5, industry Ya
 County B5, industry Yb

So that we have a reproducible example, I will create y randomly:

y <- ts(matrix(rnorm(900),ncol=45,nrow=20))

Then we can construct labels for the columns of this matrix as follows:

blnames <- paste(c(rep("A",20),rep("B",25)), # State
             rep(1:9,each=5), # County
             rep(c("X","X","X","Y","Y"),9), # Industry
             rep(c("a","b","c","a","b"),9), # Sub-industry
             sep="")
colnames(y) <- blnames

For example, the first series in the matrix has name "A1Xa" meaning state A, county 1, industry X, sub-industry a.

We can then easily create the grouped time series object using

gy <- gts(y, characters=list(c(1,1),c(1,1)))

The characters argument indicates there are two hierarchies (two elements in the list), and the first hierarchy is specified by the first two characters, with the second hierarchy specified by the second two characters.

A slightly more complicated but analogous example (with labels taking more than one character each) is given in the help file for gts in v4.3 of the hts package.

It is possible to specify the grouping structure without using column labels. Then you have to specify the groups matrix which defines what aggregations are of interest. In the example above, the groups matrix is given by

gps <- rbind(
  c(rep(1,20),rep(2,25)), # State
  rep(1:9,each=5), # County
  rep(c(1,1,1,2,2),9), # Industry
  rep(1:5, 9), # Sub-industry
  c(rep(c(1,1,1,2,2),4),rep(c(3,3,3,4,4),5)), # State x industry
  c(rep(1:5, 4),rep(6:10, 5)), # State x Sub-industry
  rep(1:18, rep(c(3,2),9)) # County x industry
)

Then

gy <- gts(y, groups=gps)

It is much easier to use the column names approach with the characters argument as constructing all those cross-product rows can get confusing.

like image 159
Rob Hyndman Avatar answered Sep 28 '22 07:09

Rob Hyndman