Why do I get different results using ccf() and cor() in R?




Why do I get different correlation results between cor() and ccf()?

> set.seed(123)
> ts1 = xts(1:100, as.POSIXlt(1366039619, tz="", origin="1970-01-01") + rnorm(100, 0, 3))
> ts2 = xts(1:100, as.POSIXlt(1366039619, tz="", origin="1970-01-01") + rnorm(100, 0, 3))

> as.vector(ccf(as.integer(ts1[,1]), as.integer(ts2[,1]), lag.max =10, plot =F, na.action=na.pass)$acf)
 [1] -0.13747975 -0.00747975 -0.09497750 -0.01031203 -0.07564956  0.19881488 -0.11353135  0.01673867  0.12900690  0.00059706 -0.09642964  0.20852985  0.02476448  0.00126913 -0.03467147 -0.04284728 -0.05561356
[18]  0.08875188  0.01587159 -0.04449745  0.01002100

> sapply(seq(-10, 10), function(x, ts1, ts2) { cor(ts1[,1], lag(ts2[,1], x), use="complete.obs") }, ts1, ts2)
 [1] -0.154055651 -0.008411318 -0.104222576 -0.011595184 -0.082495425  0.210464976 -0.118454928  0.018112365  0.132716811  0.000694595 -0.096429643  0.209312640  0.025156993  0.001450175 -0.035451383
[16] -0.043902825 -0.057842616  0.093863686  0.017485161 -0.047042779  0.011511559

> sapply(seq(-10, 10), function(x, ts1, ts2) { cor(ts1[,1], lag(ts2[,1], x), use="complete.obs") }, ts1, ts2) - as.vector(ccf(as.integer(ts1[,1]), as.integer(ts2[,1]), lag.max =10, plot =F, na.action=na.pass)$acf)
 [1] -0.0165759032546357876203 -0.0009315701778466996610 -0.0092450780124607306876 -0.0012831523310935632337 -0.0068458595845764941279  0.0116500945970494651505 -0.0049235745757881255180
 [8]  0.0013736907995123247284  0.0037099107611970050247  0.0000975349354166987759 -0.0000000000000000277556  0.0007827869094209904954  0.0003925162566637135919  0.0001810479989895477041
[15] -0.0007799161627975795263 -0.0010555407353524254299 -0.0022290547145371181204  0.0051118107350296843050  0.0016135741880074876142 -0.0025453295798825298357  0.0014905566679348520448


Since ccf() use acf(), the difference can be reduced to:

> as.vector(acf(c(42, 5, 65437, 23), plot=F, lag.max=1)$acf)
[1]  1.000000 -0.416954
> cor(c(42, 5, 65437, 23), c(NA, 42, 5, 65437), use="pairwise.complete.obs")
[1] -0.500218
> cor(c(42, 5, 65437, 23), c(5, 65437, 23, NA), use="pairwise.complete.obs")
[1] -0.500218
There are a couple of differences between cor and acf in your examples. Let's pick a more manageable (and already demeaned) example:

x = c(-2,-1,0,1,2)
acf(x, plot = F, lag.max = NULL)
# Autocorrelations of series ‘x’, by lag
#   0    1    2    3    4 
# 1.0  0.4 -0.1 -0.4 -0.4 

Here's how acf arrives at this, e.g. for lag=2:

acf_lag_2 = sum(x*c(x[c(-1,-2)],NA,NA), na.rm = T) /

Contrast this to what your cor construct would do:

cor(x, c(0,1,2,NA,NA), use="pairwise.complete.obs") # = cor(c(-2,-1,0), c(0,1,2)) = 1

cor_lag_2 = sum((c(-2,-1,0)+1)*(c(0,1,2)-1)) /   # recall cor needs to demean both vectors

So acf demeans only once in the very beginning and uses that for normalization throughout, whereas cor would normalize and demean separately for each lag.

