Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

zoo object aggregation

Tags:

dataframe

r

zoo

Dear Community,

the data I receive will be in a data frame:

Var_1      Var_2         Date        VaR_3  VaR_4   VaR_5   Var_6
1           4       2010-01-18         7    apple    10    sweet
2           5       2010-07-19         8    orange   11    sour
3           6       2010-01-18         9    kiwi     12    juicy
...        ...      ...               ...   ...     ...    ... 

I would like to use zoo, since it seems to be a flexible object class. I'm just starting with R and I tried to read the description (vignettes) for the package.

Questions:

  1. Given the above data as a data frame, which method is recommended to convert the complete df into a zoo object, telling zoo that it shall use the third column as date column (dates can occur multiple times in the data)?
  2. How do I aggregate all other columns monthly, except columns 4 and 6 using zoo built-in functions? Is zoo able to automatically discard categorical variables and just use those columns that are suited for aggregation?
  3. How do I aggregate all numeric columns monthly, for each category in column 4 (column 6 shall not be included, since it is non-numeric).

Thanks for your support.

like image 902
John Avatar asked Oct 24 '11 11:10

John


1 Answers

zoo objects are time series and are normally numeric vectors or matrices. It seems that what you really have is a bunch of different time series where column 5 identifies which series it is. That is, there is an apple series, an orange series, a kiwi series, etc. and each of them have several columns.

Dropping the last column since its not numeric, using the third column as the index and splitting on column 5 we have:

# create test data
Lines <- "Var_1      Var_2         Date        VaR_3  VaR_4   VaR_5   Var_6
1           4       2010-01-18         7    apple    10    sweet
2           5       2010-07-19         8    orange   11    sour
3           6       2010-01-18         9    kiwi     12    juicy"
cat(Lines, "\n", file = "data.txt")

library(zoo)
z <- read.zoo("data.txt", header = TRUE, index = 3, split = "VaR_5",
  colClasses = c(Var_6 = "NULL"))

The result is:

> z
           Var_1.apple Var_2.apple VaR_3.apple VaR_5.apple Var_1.kiwi
2010-01-18           1           4           7          10          3
2010-07-19          NA          NA          NA          NA         NA
           Var_2.kiwi VaR_3.kiwi VaR_5.kiwi Var_1.orange Var_2.orange
2010-01-18          6          9         12           NA           NA
2010-07-19         NA         NA         NA            2            5
           VaR_3.orange VaR_5.orange
2010-01-18           NA           NA
2010-07-19            8           11

The above assumes that for a given value of column 5 that the dates are unique. If that is not the case then include the aggregate = mean argument or some other value for aggregate.

To now aggregate it into a monthly zoo series we have:

aggregate(z, as.yearmon, mean)

It would also be possible to convert it straight away to monthly by using the FUN = as.yearmon argument:

zm <- read.zoo("data.txt", header = TRUE, index = "Date", split = "VaR_4", 
  FUN = as.yearmon, colClasses = c(Var_6 = "NULL"), aggregate = mean)

See ?read.zoo, vignette("zoo-read"), ?aggregate.zoo and the other vignettes and help files as well.

like image 79
G. Grothendieck Avatar answered Sep 19 '22 13:09

G. Grothendieck