Is there a way to count the number of pages in a PDF file from R? If not, is there another OS-independent way to do this? So far, the only answer I have found is this, and it is specific to Windows 7.
I am trying to compile some reports in R and knitr
, aggregating the PDF plot output from a previous script, which automatically processes hundreds of data sets. Some of the datasets are bad, and end up breaking the plot function. Since the plot function is wrapped in the pdf
function, an empty PDF file gets produced, is found by the report, and breaks pdflatex
. Trying to modify the analysis script to avoid producing these PDF's in the first place has proven to be difficult, and is very case-specific. I would really like to have some function which I can embed in the report that will check the PDF for >=1 page(s) before including it. I would prefer an R based solution, though a bash, LaTeX, knitr
, or pdflatex solution might also suffice.
EDIT: Also, as is mentioned in the previous answer I linked to, I tried to use Rpoppler
(here) but cannot get it to compile. I am using R version 3.3.0 in a CentOS 6 environment without admin access.
In Adobe Acrobat Pro, go to file > create PDF > merge files into a single PDF. Then add files and select the files you want. Click combine, and see how many pages are in the final PDF.
PDE is a R package that easily extracts information and tables from PDF files.
You could also use pdf-lib . And then just get the number of pages of the attached file with: const numPages = await getNumPages(input. files[0]);
exec('/usr/bin/pdfinfo '. $tmpfname. ' | awk \'/Pages/ {print $2}\”, $output); Method 4: Using pdf as text file: Now, let us see the code in PHP which will tell us the number of pages in a pdf document.
The other suggestions and code seem unnecessarily opaque or complicated. Once pdftools
is installed, the pdf_info
command will return a pages field:
library(pdftools)
# returns number of pages
# assumes your_file_name.pdf is in working directory
pdf_info("your_file_name.pdf")$pages
# to see other available metadata in pdf_info object, use names()
names(pdf_info("your_file_name.pdf"))
The script below worked for me.
#########################################
#GET PDF PAGE NUMBER :: R - JULY 16
##########################################
##SOURCE
#----pdftools package
#https://cran.rstudio.com/web/packages/pdftools
#Requirement
#brew install poppler
## TO AVOID ERROR ::: configure: error: cannot determine poppler-glib compile/link flags
#INSTALL PACKAGES
#install.packages("pdftools", dependencies=TRUE) #only once
#IN/OUT FILES
in_put_pdf="pathTo/test.pdf"
out_put_pdf="pathTo/testCopy.pdf"
#LOAD LIBS
library(pdftools)
#Copy of the original file
file.copy(file.path(Sys.getenv("PATH_TO_PDF_FILE"), in_put_pdf), out_put_pdf)
#Many informations about the file are displayed here
info <- pdf_info(out_put_pdf)
text <- pdf_text(out_put_pdf)
fonts <- pdf_fonts(out_put_pdf)
files <- pdf_attachments(out_put_pdf)
#To get the number of pages
numberOfPageInPdf = info[2]
numberOfPageInPdf
Hope that can help. Good luck.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With