Why do I get different correlation results between cor() and ccf()?
library(xts)
> set.seed(123)
> ts1 = xts(1:100, as.POSIXlt(1366039619, tz="", origin="1970-01-01") + rnorm(100, 0, 3))
> ts2 = xts(1:100, as.POSIXlt(1366039619, tz="", origin="1970-01-01") + rnorm(100, 0, 3))
> as.vector(ccf(as.integer(ts1[,1]), as.integer(ts2[,1]), lag.max =10, plot =F, na.action=na.pass)$acf)
[1] -0.13747975 -0.00747975 -0.09497750 -0.01031203 -0.07564956 0.19881488 -0.11353135 0.01673867 0.12900690 0.00059706 -0.09642964 0.20852985 0.02476448 0.00126913 -0.03467147 -0.04284728 -0.05561356
[18] 0.08875188 0.01587159 -0.04449745 0.01002100
> sapply(seq(-10, 10), function(x, ts1, ts2) { cor(ts1[,1], lag(ts2[,1], x), use="complete.obs") }, ts1, ts2)
[1] -0.154055651 -0.008411318 -0.104222576 -0.011595184 -0.082495425 0.210464976 -0.118454928 0.018112365 0.132716811 0.000694595 -0.096429643 0.209312640 0.025156993 0.001450175 -0.035451383
[16] -0.043902825 -0.057842616 0.093863686 0.017485161 -0.047042779 0.011511559
> sapply(seq(-10, 10), function(x, ts1, ts2) { cor(ts1[,1], lag(ts2[,1], x), use="complete.obs") }, ts1, ts2) - as.vector(ccf(as.integer(ts1[,1]), as.integer(ts2[,1]), lag.max =10, plot =F, na.action=na.pass)$acf)
[1] -0.0165759032546357876203 -0.0009315701778466996610 -0.0092450780124607306876 -0.0012831523310935632337 -0.0068458595845764941279 0.0116500945970494651505 -0.0049235745757881255180
[8] 0.0013736907995123247284 0.0037099107611970050247 0.0000975349354166987759 -0.0000000000000000277556 0.0007827869094209904954 0.0003925162566637135919 0.0001810479989895477041
[15] -0.0007799161627975795263 -0.0010555407353524254299 -0.0022290547145371181204 0.0051118107350296843050 0.0016135741880074876142 -0.0025453295798825298357 0.0014905566679348520448
UPDATE
Since ccf() use acf(), the difference can be reduced to:
> as.vector(acf(c(42, 5, 65437, 23), plot=F, lag.max=1)$acf)
[1] 1.000000 -0.416954
> cor(c(42, 5, 65437, 23), c(NA, 42, 5, 65437), use="pairwise.complete.obs")
[1] -0.500218
> cor(c(42, 5, 65437, 23), c(5, 65437, 23, NA), use="pairwise.complete.obs")
[1] -0.500218
There are a couple of differences between cor
and acf
in your examples. Let's pick a more manageable (and already demeaned) example:
x = c(-2,-1,0,1,2)
acf(x, plot = F, lag.max = NULL)
# Autocorrelations of series ‘x’, by lag
# 0 1 2 3 4
# 1.0 0.4 -0.1 -0.4 -0.4
Here's how acf
arrives at this, e.g. for lag=2
:
acf_lag_2 = sum(x*c(x[c(-1,-2)],NA,NA), na.rm = T) /
sqrt(sum(x*x)*sum(x*x))
Contrast this to what your cor
construct would do:
cor(x, c(0,1,2,NA,NA), use="pairwise.complete.obs") # = cor(c(-2,-1,0), c(0,1,2)) = 1
cor_lag_2 = sum((c(-2,-1,0)+1)*(c(0,1,2)-1)) / # recall cor needs to demean both vectors
sqrt(sum(c(-1,0,1)*c(-1,0,1))*sum(c(-1,0,1)*c(-1,0,1)))
So acf
demeans only once in the very beginning and uses that for normalization throughout, whereas cor
would normalize and demean separately for each lag.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With