I am having trouble with the stats::lag function when using the
dplyr package. Specifically, I get different results from the lag
function before and after loading dplyr.
For example, here is a sample time series. If I calculate the lag with
k = -1, the lagged series starts in 1971.
data <- ts(1:10, start = 1970, frequency = 1)
lag1 <- stats::lag(data, k = -1)
start(lag1)[1]
## [1] 1971
Now, if I load dplyr, the same call yields a lagged series starting in
1970.
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
lag2 <- stats::lag(data, k = -1)
start(lag2)[1]
## [1] 1970
start(lag1)[1] == start(lag2)[1]
## [1] FALSE
Given the warnings when loading dplyr, my guess is that this has to do
with Environments. But, detaching dplyr doesn't seem to help.
detach("package:dplyr", unload = TRUE, character.only = TRUE)
lag3 <- stats::lag(data, k = -1)
start(lag3)[1]
## [1] 1970
start(lag1)[1] == start(lag3)[1]
## [1] FALSE
Any suggestions are greatly appreciated. My only solution so far is to
restart the R session between calculating lag1 and lag2.
Here's my session:
## setting value
## version R version 3.2.0 (2015-04-16)
## system i386, mingw32
## ui RTerm
## language (EN)
## collate English_Canada.1252
## tz America/New_York
##
## package * version date source
## assertthat 0.1 2013-12-06 CRAN (R 3.2.0)
## bitops 1.0-6 2013-08-17 CRAN (R 3.2.0)
## DBI 0.3.1 2014-09-24 CRAN (R 3.2.0)
## devtools 1.8.0 2015-05-09 CRAN (R 3.2.0)
## digest 0.6.8 2014-12-31 CRAN (R 3.2.0)
## dplyr 0.4.1 2015-01-14 CRAN (R 3.2.0)
## evaluate 0.7 2015-04-21 CRAN (R 3.2.0)
## formatR 1.2 2015-04-21 CRAN (R 3.2.0)
## git2r 0.10.1 2015-05-07 CRAN (R 3.2.0)
## htmltools 0.2.6 2014-09-08 CRAN (R 3.2.0)
## httr * 0.6.1 2015-01-01 CRAN (R 3.2.0)
## knitr 1.10.5 2015-05-06 CRAN (R 3.2.0)
## magrittr 1.5 2014-11-22 CRAN (R 3.2.0)
## memoise 0.2.1 2014-04-22 CRAN (R 3.2.0)
## Rcpp 0.11.6 2015-05-01 CRAN (R 3.2.0)
## RCurl 1.95-4.6 2015-04-24 CRAN (R 3.2.0)
## rmarkdown 0.6.1 2015-05-07 CRAN (R 3.2.0)
## rversions 1.0.0 2015-04-22 CRAN (R 3.2.0)
## stringi 0.4-1 2014-12-14 CRAN (R 3.2.0)
## stringr 1.0.0 2015-04-30 CRAN (R 3.2.0)
## XML 3.98-1.1 2013-06-20 CRAN (R 3.2.0)
## yaml 2.1.13 2014-06-12 CRAN (R 3.2.0)
I've also tried unloadNamespace, as suggested by @BondedDust:
unloadNamespace("dplyr")
lag4 <- stats::lag(data, k = -1)
## Warning: namespace 'dplyr' is not available and has been replaced
## by .GlobalEnv when processing object 'sep'
start(lag4)[1]
## [1] 1970
start(lag1)[1] == start(lag4)[1]
## [1] FALSE
The dplyr package is effectively overwriting 'lag'. The dispatch mechanism is not finding lag because there really is no function by that name, just two copies of lag.default, one in 'stats' and one in 'dplyr' and the 'dplyr' copy is being found first. You can force the stats version to be found with the use of the :::-mechanism:
> lag2 <- stats::lag.default(data, k = -1)
Error: 'lag.default' is not an exported object from 'namespace:stats'
> lag2 <- stats:::lag.default(data, k = -1)
> stats::start(lag2)[1]
[1] 1971
The dplyr:::lag.default does not use the time-series specific functions. I'm not able to explain why unloadNamespace fails to remove the function's definition but it is still there:
> unloadNamespace("dplyr")
> getAnywhere(lag.default)
2 differing objects matching ‘lag.default’ were found
in the following places
registered S3 method for lag from namespace dplyr
namespace:stats
Use [] to view one of them
Further weirdness: After unloading the dply-namespace I see this:
> environment(getAnywhere(lag.default)[1])
<environment: namespace:dplyr>
> environment(getAnywhere(lag.default)[2])
<environment: namespace:dplyr>
> environment(getAnywhere(lag.default)[3])
<environment: namespace:stats>
(And then restarting and loading dplyr, I see the same apparent double-entry.)
There's also something weird about the help page for dplyr::lag:
> help(lag,pac=dplyr)
No documentation for ‘lag’ in specified packages and libraries:
you could try ‘??lag’
> help(`lag`,pac=`dplyr`)
No documentation for ‘lag’ in specified packages and libraries:
you could try ‘??lag’
> help(`lag.default`,pac=`dplyr`) # This finally succeeds!
Looking at github (after determining that I had the latest version of dplyr on CRAN), I see that this was an issue for the R CMD check process: https://github.com/hadley/dplyr/commit/f8a46e030b7b899900f2091f41071619d0a46288 . Apparently lag.default will not be over-written in future versions, but lag will mask the stats-version. I wonder what happens to lag.zoo and lag.zooreg. Maybe it will also announce that over-writing or masking when the package is loaded?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With