Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging existing PDF files using R

Tags:

r

pdf

I want to merge PDF files that already exist (already saved in my computer) using R.

I already tried to use open source softwares to merge them and it works fine but since I have a couple hundreds of files to merge together, I was hoping to find something a little faster (my goal is to have the file automatically created - or updated, simply by running an R command).

I am used to R so I would like to find a way to create this new multiple-sheet PDF using this program. Is there any function that could do that for me?

Thanks!

like image 843
sts Avatar asked Jul 09 '13 16:07

sts


2 Answers

For an R-based solution which doesn't rely on calling the underlying OS with system() or system2(), I would recommend the {qpdf} package.

You may install this package as:

install.packages("qpdf")

You'll then want to make use of the pdf_combine() function. Check its documentation as:

?qpdf::pdf_combine

You can then merge as many pdfs as you like. Here I merge file.pdf, file2.pdf and file3.pdf into a new file called output.pdf:

qpdf::pdf_combine(input = c("file.pdf", "file2.pdf", "file3.pdf"),
                  output = "output.pdf")
like image 151
jwalton Avatar answered Oct 09 '22 13:10

jwalton


If you install pdftk (found here), then you can use the function below:

concatenate_pdfs <- function(input_filepaths, output_filepath) {
  # Take the filepath arguments and format them for use in a system command
  quoted_names <- paste0('"', input_filepaths, '"')
  file_list <- paste(quoted_names, collapse = " ")
  output_filepath <- paste0('"', output_filepath, '"')
  # Construct a system command to pdftk
  system_command <- paste("pdftk",
                          file_list,
                          "cat",
                          "output",
                          output_filepath,
                          sep = " ")
  # Invoke the command
  system(command = system_command)
}

Which could be called as follows:

concatenate_pdfs(input_filepaths = c("My First File.pdf", "My Second File.pdf"),
                 output_filepath = "My Combined File.pdf")

This is just a user-friendly way of invoking the following system command:

pdftk "My First File.pdf" "My Second File.pdf" cat output "My Combined File.pdf"
like image 34
bschneidr Avatar answered Oct 09 '22 13:10

bschneidr