Slightly off-topic question, but I was wondering if anybody could tell me when and how the cor() function was improved recently? It is much, much faster than I remember and is now comparable in speed to the rcorr function in HMisc package, which was my alternative correlation function for large matrices.
Thanks for all the suggestions: After some investigation, the difference in speed is due to using the use="pairwise" flag rather than an algorithmic change. There is ~8 fold difference in speed difference when using this option.
The speed for cor() on R from version 2.4 - 2.13 is comparable.
Thanks,
Iain
http://cran.r-project.org/src/base/NEWS.html has a high level summary of recent changes, and explanations of their relevance. This is sometimes useful to pick up related changes in other functions that might affect what you're doing. A quick find for cor()
only shows a couple things, however:
2.13.0
The rank-correlation methods for cor() and cov() with use = "complete.obs" computed the ranks before removing missing values, whereas the documentation implied incomplete cases were removed first. (https://bugs.R-project.org/bugzilla3/show_bug.cgi?id=14488PR#14488)
2.11.0
cor() and cov() now test for misuse with non-numeric arguments, such as the non-bug report https://bugs.R-project.org/bugzilla3/show_bug.cgi?id=14207PR#14207.
Hard to say without knowing what version you're running, but it looks like there are some substantial changes coming in 2.14, and only minor changes between 2.13 and previous versions back to at least 2.10. Compare these to see the current changes coming in 2.14:
2.13 code: https://svn.r-project.org/R/branches/R-2-13-branch/src/main/cov.c
2.14 code: https://svn.r-project.org/R/branches/R-2-14-branch/src/main/cov.c
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With