I am trying to create the node structure utilizing the HTS package in R. The documentation regarding nodes is sparse so trying to code the node structure appropriately is difficult and to add an added layer I am trying to create two hierarchies in which we have we have the following:
(Hierarchy 1 - Geography: example is US state Delaware and its counties)
=> 10000
=> 10001
=> 10003
=> 10005
=> 10999
(Hierarchy 2 - Industry: simplified)
=> 10
=> 11
=> 12
=> 21
=> 22
=> 31
...
=> 99
Edit 2 - Corrected hierarchies and further clarification
So each timeseries will have a geography code and an industry code. The geography codes follow one hierarchy and the industry codes another (shown above).
I'm trying to figure out how to specify the "nodes" argument to represent the relationships of both hierarchies (the documentation example only shows a single hierarchy).
When the two hierarchies interact, we get additional levels. Let's simplify by assuming there are only 2 industries, 11 and 12. The timeseries identified by (10001,11) and (10001,12) must add up to (10001,10); and also, (10001,11)...(10999,11) must add up to (10000,11), etc etc. Again, these are simplified hierarchies - in the real data there are more levels.
The question is, how does the "nodes" argument look like for two hierarchies? Hope this helps.
Your notation (which may not be your choice) is making this very confusing. It seems like the same numerical sequence can refer to either a county or an industry.
However, the basic idea is clear enough: you have two hierarchies and you want both types of aggregation to be taking into account. Here is an example using my own notation to make it clearer.
Suppose there are two states with four and five counties respectively, and two industries with three and two sub-industries respectively. So there are 9x5 series at the most disaggregated level (sub-industry x county combinations). I will call the states A and B, and the counties A1,A2,A3,A4 and B1,B2,B3,B4,B5. I will call the industries X and Y with sub-industries Xa,Xb,Xc and Ya,Yb respectively. Suppose you have the bottom level series (the most disaggregated level) in a matrix y
, with one column per series, and columns in the following order:
County A1, industry Xa
County A1, industry Xb
County A1, industry Xc
County A1, industry Ya
County A1, industry Yb
County A2, industry Xa
County A2, industry Xb
County A2, industry Xc
County A2, industry Ya
County A2, industry Yb
...
County B5, industry Xa
County B5, industry Xb
County B5, industry Xc
County B5, industry Ya
County B5, industry Yb
So that we have a reproducible example, I will create y
randomly:
y <- ts(matrix(rnorm(900),ncol=45,nrow=20))
Then we can construct labels for the columns of this matrix as follows:
blnames <- paste(c(rep("A",20),rep("B",25)), # State
rep(1:9,each=5), # County
rep(c("X","X","X","Y","Y"),9), # Industry
rep(c("a","b","c","a","b"),9), # Sub-industry
sep="")
colnames(y) <- blnames
For example, the first series in the matrix has name "A1Xa"
meaning state A, county 1, industry X, sub-industry a.
We can then easily create the grouped time series object using
gy <- gts(y, characters=list(c(1,1),c(1,1)))
The characters
argument indicates there are two hierarchies (two elements in the list), and the first hierarchy is specified by the first two characters, with the second hierarchy specified by the second two characters.
A slightly more complicated but analogous example (with labels taking more than one character each) is given in the help file for gts
in v4.3 of the hts
package.
It is possible to specify the grouping structure without using column labels. Then you have to specify the groups matrix which defines what aggregations are of interest. In the example above, the groups matrix is given by
gps <- rbind(
c(rep(1,20),rep(2,25)), # State
rep(1:9,each=5), # County
rep(c(1,1,1,2,2),9), # Industry
rep(1:5, 9), # Sub-industry
c(rep(c(1,1,1,2,2),4),rep(c(3,3,3,4,4),5)), # State x industry
c(rep(1:5, 4),rep(6:10, 5)), # State x Sub-industry
rep(1:18, rep(c(3,2),9)) # County x industry
)
Then
gy <- gts(y, groups=gps)
It is much easier to use the column names approach with the characters
argument as constructing all those cross-product rows can get confusing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With