Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do I get different results using ccf() and cor() in R?

Tags:

r

xts

Why do I get different correlation results between cor() and ccf()?

library(xts)
> set.seed(123)
> ts1 = xts(1:100, as.POSIXlt(1366039619, tz="", origin="1970-01-01") + rnorm(100, 0, 3))
> ts2 = xts(1:100, as.POSIXlt(1366039619, tz="", origin="1970-01-01") + rnorm(100, 0, 3))

> as.vector(ccf(as.integer(ts1[,1]), as.integer(ts2[,1]), lag.max =10, plot =F, na.action=na.pass)$acf)
 [1] -0.13747975 -0.00747975 -0.09497750 -0.01031203 -0.07564956  0.19881488 -0.11353135  0.01673867  0.12900690  0.00059706 -0.09642964  0.20852985  0.02476448  0.00126913 -0.03467147 -0.04284728 -0.05561356
[18]  0.08875188  0.01587159 -0.04449745  0.01002100

> sapply(seq(-10, 10), function(x, ts1, ts2) { cor(ts1[,1], lag(ts2[,1], x), use="complete.obs") }, ts1, ts2)
 [1] -0.154055651 -0.008411318 -0.104222576 -0.011595184 -0.082495425  0.210464976 -0.118454928  0.018112365  0.132716811  0.000694595 -0.096429643  0.209312640  0.025156993  0.001450175 -0.035451383
[16] -0.043902825 -0.057842616  0.093863686  0.017485161 -0.047042779  0.011511559

> sapply(seq(-10, 10), function(x, ts1, ts2) { cor(ts1[,1], lag(ts2[,1], x), use="complete.obs") }, ts1, ts2) - as.vector(ccf(as.integer(ts1[,1]), as.integer(ts2[,1]), lag.max =10, plot =F, na.action=na.pass)$acf)
 [1] -0.0165759032546357876203 -0.0009315701778466996610 -0.0092450780124607306876 -0.0012831523310935632337 -0.0068458595845764941279  0.0116500945970494651505 -0.0049235745757881255180
 [8]  0.0013736907995123247284  0.0037099107611970050247  0.0000975349354166987759 -0.0000000000000000277556  0.0007827869094209904954  0.0003925162566637135919  0.0001810479989895477041
[15] -0.0007799161627975795263 -0.0010555407353524254299 -0.0022290547145371181204  0.0051118107350296843050  0.0016135741880074876142 -0.0025453295798825298357  0.0014905566679348520448

UPDATE

Since ccf() use acf(), the difference can be reduced to:

> as.vector(acf(c(42, 5, 65437, 23), plot=F, lag.max=1)$acf)
[1]  1.000000 -0.416954
> cor(c(42, 5, 65437, 23), c(NA, 42, 5, 65437), use="pairwise.complete.obs")
[1] -0.500218
> cor(c(42, 5, 65437, 23), c(5, 65437, 23, NA), use="pairwise.complete.obs")
[1] -0.500218
like image 950
Robert Kubrick Avatar asked May 06 '13 13:05

Robert Kubrick


1 Answers

There are a couple of differences between cor and acf in your examples. Let's pick a more manageable (and already demeaned) example:

x = c(-2,-1,0,1,2)
acf(x, plot = F, lag.max = NULL)
# Autocorrelations of series ‘x’, by lag
#   0    1    2    3    4 
# 1.0  0.4 -0.1 -0.4 -0.4 

Here's how acf arrives at this, e.g. for lag=2:

acf_lag_2 = sum(x*c(x[c(-1,-2)],NA,NA), na.rm = T) /
            sqrt(sum(x*x)*sum(x*x))

Contrast this to what your cor construct would do:

cor(x, c(0,1,2,NA,NA), use="pairwise.complete.obs") # = cor(c(-2,-1,0), c(0,1,2)) = 1

cor_lag_2 = sum((c(-2,-1,0)+1)*(c(0,1,2)-1)) /   # recall cor needs to demean both vectors
            sqrt(sum(c(-1,0,1)*c(-1,0,1))*sum(c(-1,0,1)*c(-1,0,1)))

So acf demeans only once in the very beginning and uses that for normalization throughout, whereas cor would normalize and demean separately for each lag.

like image 57
eddi Avatar answered Nov 09 '22 03:11

eddi