I have noticed a file called 'MD5' is present in the directories of many R packages that I have downloaded. However I can not find a mention of this in the 'Writing R Extensions' manual. It lists the MD5 hash and filename for different files in the package. What is this file used for? Should it be something I include in my packages? How can it be generated?
The MD5 hash file found in the R packages is used to uniquely identify the package src
on a repository (e.g. CRAN).
Specifically, when the package is listed in a repo, the meta data of the package is added to a file called PACKAGES
. When a user requests a package via install.packages()
, a download is triggered that checks for the MD5 hash. This is stated in the ?md5sum
function
MD5 sums are used as a check that R packages have been unpacked correctly and not subsequently modified.
The inside of a PACKAGES
file would look like:
Package: datapkg
Version: 2.0.0
Depends: R (>= 3.2)
License: file LICENSE
MD5sum: 22797605db853f5f4c2c5612da366b53
NeedsCompilation: no
For more information on how repos work with install.packages()
, please see the post that I wrote:
http://thecoatlessprofessor.com/programming/r-data-packages-in-external-data-repositories-using-the-additional_repositories-field/
The file is used as input to tools::checkMD5sums()
and checks the integrity of the installed package. The format can be reverse engineered from the code: a text file that has a line for each included file, containing the MD5 hash, a *
separator, and the file path relative to the specified root directory. You can create these by hand from the output of tools::md5sum()
- or you can use a function that I have provided in this Gist, where I also discuss this in more detail.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With