Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most elegant way to calculate seasonal means with R?

I have evenly spaces timeseries with daily mean observational data.

How do I compute seasonal means the easiest way? The seasons should follow the meteorological nomenclature with DJF (=winter: December, January, February), MAM, JJA, and SON.

That means December values comes from the year x-1.

The calculation of monthly means is nicely presented here: How to calculate a monthly mean?

It is possible to follow this idea when computing seasonal means. However, several caveats makes it not very transparent and one has to be careful!

I also dealt with a small part of this issue already in a former thread: How to switch rows in R?

Here is now the complete story:

0: make a random time series

ts.pdsi <- data.frame(date = seq(
                from=as.Date("1901-01-01"), 
                to=as.Date("2009-12-31"), 
                by="day"))
ts.pdsi$scPDSI <- rnorm(dim(ts.foo)[1],  mean=1, sd=1)    # add some data

1st: use the seas package and add seasons to your timeseries, which has to be formatted as a data.frame.

library(seas)
# add moth/seasons
ts.pdsi$month  <- mkseas(ts.pdsi,"mon")   # add months
ts.pdsi$seas <- mkseas(ts.pdsi,"DJF")     # add seasons
ts.pdsi$seasyear <- paste(format(ts.pdsi[,1],"%Y"), 
                          ts.pdsi$seas ,sep="")   # add seasyears, e.g. 1950DJF

this gives

> head(ts.pdsi)
    date      scPDSI month seas seasyear
1 1901-01-01 -0.10881074   Jan  DJF  1901DJF
2 1901-02-01 -0.22287750   Feb  DJF  1901DJF
3 1901-03-01 -0.12233192   Mär  MAM  1901MAM
4 1901-04-01 -0.04440915   Apr  MAM  1901MAM
5 1901-05-01 -0.36334082   Mai  MAM  1901MAM
6 1901-06-01 -0.52079030   Jun  JJA  1901JJA

2nd: You can then calculate the seasonal means, following the above mentioned approach using the column $seasyear

> MEAN <- tapply(pdsi$scPDSI, ts.pdsi$seasyear, mean, na.rm = T)
> head(MEAN)
1901DJF     1901JJA     1901MAM     1901SON     1902DJF     1902JJA 
-0.45451556 -0.72922229 -0.17669396 -1.12095590 -0.86523850 -0.04031273 

NOTE: spring (MAM) and summer (JJA) are switched due to strictley alphabetical sorting.

3rd: switch it back

foo <- MEAN
for(i in 1:length(MEAN)) {
    if (mod (i,4) == 2) {
        foo[i+1] <- foo[i]    #switch 2nd 3rd row (JJA <-> MAM)
        foo[i] <- MEAN[i+1]
    }
}
# and generate new names for the array
d <- data.frame(date=seq(from=as.Date("1901-01-01"), to=as.Date("2009-12-31"), by="+3 month"))
d$seas <- mkseas(d,"DJF") 
d$seasyear <- paste(format(d[,1],"%Y"), d$seas ,sep="")
names(foo)<-d$seasyear  # add right order colnames
MEAN <-foo

Finally, this results in a time series of seasonal means. Well, I fid it too complicated and i guess there are much easier solutions around.

Additionally, this solution has also a really major problem with the winter season DJF: The December is so far not choosen from the year before. This is rather easy to fix (I guess), but makes the given way eve more complicated.

I really hope there are better ideas around!

like image 815
stephan Avatar asked Sep 24 '13 09:09

stephan


2 Answers

I this what you want?

# # create some data: daily values for three years
df <- data.frame(date = seq(from = as.Date("2007-01-01"),
                            to = as.Date("2009-12-31"),
                            by = "day"))
df$vals <- rnorm(nrow(df))

# add year
df$year <- format(df$date, "%Y")

# add season
df$seas <- mkseas(x = df, width = "DJF")

# calculate mean per season within each year
df2 <- aggregate(vals ~ seas + year, data = df, mean)

df2
#    seas year         vals
# 1   DJF 2007 -0.048407610
# 2   MAM 2007  0.086996842
# 3   JJA 2007  0.013864555
# 4   SON 2007 -0.081323367
# 5   DJF 2008  0.170887946
# 6   MAM 2008  0.147830260
# 7   JJA 2008  0.003008866
# 8   SON 2008 -0.057974215
# 9   DJF 2009 -0.043437437
# 10  MAM 2009 -0.048345979
# 11  JJA 2009  0.023860506
# 12  SON 2009 -0.060076870

Because mkseas converts the dates into a seasonal factor with levels in the desired order, the order is correct also after the aggregation over year and season.

like image 52
Henrik Avatar answered Nov 11 '22 14:11

Henrik


It's probably easier if you use numbers rather than strings for months and seasons, at least at first. You can get the seasons you want by simple arithmetic manipulations, including making December part of the subsequent year.

pdsi <- data.frame(date = seq(
            from=as.Date("1901-01-01"), 
            to=as.Date("2009-12-31"), 
            by="day"))
pdsi$scPDSI <- rnorm(nrow(pdsi),  mean=1, sd=1)
pdsi$mon<-mon(pdsi$date)+1
pdsi$seas<-floor((pdsi$mon %% 12)/3)+1
pdsi$year<-year(pdsi$date)+1900
pdsi$syear<-pdsi$year
pdsi$syear[pdsi$mon==12]<-pdsi$syear[pdsi$mon==12]+1

To compute seasonal means, you can simply do this:

meanArray<-tapply(pdsi$scPDSI,list(year=pdsi$syear,seas=pdsi$seas),mean)

And now you have

>head(meanArray)
      seas
year           1         2         3         4
  1901 1.0779676 1.0258306 1.1515175 0.9682434
  1902 0.9900312 0.8964994 1.1028336 1.0074296
  1903 0.9912233 0.9858088 1.1346901 1.0569518
  1904 0.7933653 1.1566892 1.1223454 0.8914211
  1905 1.1441863 1.1824074 0.9044940 0.8971485
  1906 0.9900826 0.9933909 0.9185972 0.8922987

If you want it as a flat array, with appropriate names, you first take the transpose, and then flatten the array, and add the names

colnames(meanArray)<-c("DJF","MAM","JJA","SON")
meanArray<-t(meanArray)
MEAN<-array(meanArray)
names(MEAN)<-paste(colnames(meanArray)[col(meanArray)],rownames(meanArray)[row(meanArray)],sep="")

This gets you get the result you wanted

> head(MEAN)
  1901DJF   1901MAM   1901JJA   1901SON   1902DJF   1902MAM 
1.0779676 1.0258306 1.1515175 0.9682434 0.9900312 0.8964994  
like image 44
mrip Avatar answered Nov 11 '22 14:11

mrip