Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use citation() in R Markdown to automatically generate a bibliography of R packages

Tags:

r

r-markdown

I would like to cite the R packages used in a project but since they are quite numerous, I think it would be a good idea to create two separate reference sections: one with the references of my specific domain and one with the references for the R packages.

My first idea would be to check if I can export all the citations of the packages used at once in a .bib file, but I'm not sure that R Markdown can handle both the .bib file with the bibliography of papers specific to my domain and the .bib file for the R packages.

Since the functions citation() or toBibtex() generate Bibtex citations, I thought it might be possible to generate the reference section dedicated to the R packages with these functions directly in the .Rmd file. However, it does not seem possible to automatically format a reference when these commands are included in a chunk with R Markdown.

Here's a reproducible example of the thing I'm trying to do:

---
title: "Cite R packages"
author: ""
date: "01/02/2020"
output: pdf_document
bibliography: test.bib
---

This is a citation of a paper: @mayer2011.

# Bibliography {-}
\setlength{\parindent}{-0.2in}
\setlength{\leftskip}{0.2in}
\noindent
<div id="refs"></div>
```{r refmgr references, results="asis", echo=FALSE}
# Print
```
\setlength{\parindent}{0in}
\setlength{\leftskip}{0in}
\setlength{\parskip}{0pt}

# Bibliography for R packages {-}
```{r}
citation("dplyr")
toBibtex(citation("dplyr"))
```

and here's the content of test.bib:

@article{mayer2011,
  title = {Notes on {{CEPII}}'s {{Distances Measures}}: {{The GeoDist Database}}},
  shorttitle = {Notes on {{CEPII}}'s {{Distances Measures}}},
  journal = {SSRN Electronic Journal},
  doi = {10.2139/ssrn.1994531},
  author = {Mayer, Thierry and Zignago, Soledad},
  year = {2011}
}

Any idea about how to easily include the references of the R packages in a separate reference section?

EDIT: see here for another solution.

like image 396
bretauv Avatar asked Feb 02 '20 12:02

bretauv


People also ask

How do you cite R packages in Rmarkdown?

To cite the 'rmarkdown' package in publications, please use: JJ Allaire and Yihui Xie and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone (2020). rmarkdown: Dynamic Documents for R. R package version 2.5.

How do you add a citation in markdown?

Inserting Citations. You insert citations by either using the Insert -> Citation command or by using markdown syntax directly (e.g. [@cite] or @cite ) . Citations go inside square brackets and are separated by semicolons.

How do you do a citation in R?

Items can be cited directly within the documentation using the syntax @key where key is the citation key in the first line of the entry, e.g., @R-base . To put citations in parentheses, use [@key] . To cite multiple entries, separate the keys by semicolons, e.g., [@key-1; @key-2; @key-3] .

Is it possible to automatically format a reference in R Markdown?

However, it does not seem possible to automatically format a reference when these commands are included in a chunk with R Markdown. --- title: "Cite R packages" author: "" date: "01/02/2020" output: pdf_document bibliography: test.bib --- This is a citation of a paper: @mayer2011.

How do I cite a package in R?

The first argument should be a character vector of package names, and the second argument is the path to the .bib file. In the above example, .packages () returns the names of all packages loaded in the current R session. This makes sure all packages being used will have their citation entries written to the .bib file.

Can citation() and tobibtex() be included in a chunk with R Markdown?

Since the functions citation () or toBibtex () generate Bibtex citations, I thought it might be possible to generate the reference section dedicated to the R packages with these functions directly in the .Rmd file. However, it does not seem possible to automatically format a reference when these commands are included in a chunk with R Markdown.

How do I cite a YAML header in R Markdown?

The steps are as follows: 1. Add the Following Code to Your YAML Header in Your R Markdown Document Note that this code replaces the output: html_document that is the default for R Markdown. This tells R to use bookdown rather than the conventional R Markdown. Notice below you use html_document2 to create html output with inline citations.


Video Answer


1 Answers

There are two seperate though related problems here:

  1. How to cite a package programatically
  2. How to have two seperate reference sections in your markdown document

There are solutions to both of them which I'll go over in turn:


How to cite a package programatically

The key here is realising that Pandoc will only write your document after the R code chunks have run. This gives you the opportunity of writing a .bib file programatically as part of your R markdown document, which is only read by Pandoc at the document creation stage.

It also depends on being able to use two .bib files in your bibliography. This is also possible, but we'll leave that issue for now.

What you need is a function that will take package names, get the bibtex-formatted citations, paste them all together and save them as a .bib file. I have written an example function here to show how that could be done.

This function has to handle packages that spit out multiple bibtex citations, and it will automatically insert the package name in bibtex so that you can reference any package in your markdown with @packagename. It uses non-standard evaluation and the ... arguments so that you don't need to quote the package names or wrap them in c():

citeR <- function(...)
{
  packages <- unlist(lapply(as.list(match.call()), deparse))[-1]
  Rbibs <- ""

  for(package in packages)
  {
    Rbib <- capture.output(print(citation(package), bibtex = T))    
    Rbib <- mapply(function(x, y) Rbib[x:y], 
                   grep("  @.+[{]", Rbib), 
                   which(Rbib == "  }"))

    if(class(Rbib) == "matrix"){
      Rbib[1, 1] <- gsub(",", paste0(package, ","), Rbib[1, 1])
      Rbib <- paste0(Rbib, collapse = "\n")
    } else {
      Rbib <- unlist(lapply(Rbib, function(x) {
                               x[1] <- gsub(",", paste0(package, ","), x[1]); 
                               x <- paste0(unlist(x), collapse = "\n")
                               return(x)
                             }))
    }

    if(length(Rbib) > 1) {
      if(any(grepl("@Manual", Rbib))) {
        Rbib <- Rbib[grep("@Manual", Rbib)][1]
      } else {
        Rbib <- Rbib[1]}}

    Rbibs <- paste(Rbibs, Rbib, sep = "\n\n")
  }

  writeBin(charToRaw(utf8::as_utf8(Rbibs)), "packages.bib")
}

To use it you would just put it in an R chunk with an echo = FALSE and do this:

citeR(dplyr, ggplot2, knitr, pROC)

How to have two reference sections

I cannot take credit for this part of the answer, which I got from here. It is more involved than the first part. First of all, you must use a lua filter, and this requires the most up to date versions of rmarkdown and Pandoc so please update to the lastest versions or this may not work.

The rationale for the lua filter is described in the provided link, but I will include it here with full acknowledgement to @tarleb. You must save the following file as multiple-bibliographies.lua in the same directory as your markdown:

-- file: multiple-bibliographies.lua

--- collection of all cites in the document
local all_cites = {}
--- document meta value
local doc_meta = pandoc.Meta{}

--- Create a bibliography for a given topic. This acts on all divs whose ID
-- starts with "refs", followed by nothings but underscores and alphanumeric
-- characters.
local function create_topic_bibliography (div)
  local name = div.identifier:match('^refs([_%w]*)$')
  if not name then
    return nil
  end
  local tmp_blocks = {
    pandoc.Para(all_cites),
    pandoc.Div({}, pandoc.Attr('refs')),
  }
  local tmp_meta = pandoc.Meta{bibliography = doc_meta['bibliography' .. name]}
  local tmp_doc = pandoc.Pandoc(tmp_blocks, tmp_meta)
  local res = pandoc.utils.run_json_filter(tmp_doc, 'pandoc-citeproc')
  -- first block of the result contains the dummy para, second is the refs Div
  div.content = res.blocks[2].content
  return div
end

local function resolve_doc_citations (doc)
  -- combine all bibliographies
  local meta = doc.meta
  local orig_bib = meta.bibliography
  meta.bibliography = pandoc.MetaList{orig_bib}
  for name, value in pairs(meta) do
    if name:match('^bibliography_') then
      table.insert(meta.bibliography, value)
    end
  end
  doc = pandoc.utils.run_json_filter(doc, 'pandoc-citeproc')
  doc.meta.bibliography = orig_bib -- restore to original value
  return doc
end

return {
  {
    Cite = function (c) all_cites[#all_cites + 1] = c end,
    Meta = function (m) doc_meta = m end,
  },
  {Pandoc = resolve_doc_citations,},
  {Div = create_topic_bibliography,}
}

To get this to work, your YAML header should look like this:

---
title: "Cite R packages"
author: ''
date: "01/02/2020"
output:
  pdf_document:
    pandoc_args: --lua-filter=multiple-bibliographies.lua
bibliography_software: packages.bib
bibliography_normal: test.bib
---

Note that packages.bib doesn't need to exist when you start knitting the document, since it will be created before Pandoc is called.

To insert the references sections, you need put these html snippets at the appropriate points of your markdown:

<div id = "refs_normal"></div>

and

<div id = "refs_software"></div>

Putting it all together

I know this is already a long answer, but I thought it would be good to include a full working example and show the pdf output:

---
title: "Cite R packages"
author: ''
date: "01/02/2020"
output:
  pdf_document:
    pandoc_args: --lua-filter=multiple-bibliographies.lua
bibliography_software: packages.bib
bibliography_normal: test.bib
---

This is a citation of a paper: @mayer2011.
This is a citation of an R package @dplyr
And another @ggplot2 and another @knitr plus @pROC

# Bibliography{-}
\setlength{\parindent}{-0.2in}
\setlength{\leftskip}{0.2in}
\noindent
<div id = "refs_normal"></div>
\setlength{\parindent}{0in}
\setlength{\leftskip}{0in}
\setlength{\parskip}{0pt}

# Software used{-}
\setlength{\parindent}{-0.2in}
\setlength{\leftskip}{0.2in}
\noindent
<div id = "refs_software"></div>
\setlength{\parindent}{0in}
\setlength{\leftskip}{0in}
\setlength{\parskip}{0pt}

```{r citeR, echo=FALSE}

citeR <- function(...)
{
  packages <- unlist(lapply(as.list(match.call()), deparse))[-1]
  Rbibs <- ""

  for(package in packages)
  {
    Rbib <- capture.output(print(citation(package), bibtex = T))

    Rbib <- mapply(function(x, y) Rbib[x:y], 
                   grep("  @.+[{]", Rbib), 
                   which(Rbib == "  }"))

    if(class(Rbib) == "matrix")
    {
      Rbib[1, 1] <- gsub(",", paste0(package, ","), Rbib[1, 1])
      Rbib <- paste0(Rbib, collapse = "\n")
    }
    else
    {
      Rbib <- unlist(lapply(Rbib, function(x) {
                               x[1] <- gsub(",", paste0(package, ","), x[1]); 
                               x <- paste0(unlist(x), collapse = "\n")
                               return(x)
                             }))
    }

    if(length(Rbib) > 1)
    {
      if(any(grepl("@Manual", Rbib)))
      {
        Rbib <- Rbib[grep("@Manual", Rbib)][1]
      }
      else
      {
        Rbib <- Rbib[1]
      }
    }

    Rbibs <- paste(Rbibs, Rbib, sep = "\n\n")
  }

  writeBin(charToRaw(utf8::as_utf8(Rbibs)), "packages.bib")
}

citeR(dplyr, ggplot2, knitr, pROC)

```#

and test.pdf looks like this:

enter image description here

If you would rather automatically cite any packages you use, you could programatically scrape the names from any calls to library() in your markdown document. Since the workflow for achieving your goal is a little convoluted, you might want to consider creating a small package with the citeR function, the lua document and your own get_lib_citations_from_library_calls("my_markdown.Rmd") function that automates all of this.

like image 113
Allan Cameron Avatar answered Oct 21 '22 10:10

Allan Cameron