Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count the number of pages in a PDF from R?

Tags:

r

pdf

Is there a way to count the number of pages in a PDF file from R? If not, is there another OS-independent way to do this? So far, the only answer I have found is this, and it is specific to Windows 7.

I am trying to compile some reports in R and knitr, aggregating the PDF plot output from a previous script, which automatically processes hundreds of data sets. Some of the datasets are bad, and end up breaking the plot function. Since the plot function is wrapped in the pdf function, an empty PDF file gets produced, is found by the report, and breaks pdflatex. Trying to modify the analysis script to avoid producing these PDF's in the first place has proven to be difficult, and is very case-specific. I would really like to have some function which I can embed in the report that will check the PDF for >=1 page(s) before including it. I would prefer an R based solution, though a bash, LaTeX, knitr, or pdflatex solution might also suffice.

EDIT: Also, as is mentioned in the previous answer I linked to, I tried to use Rpoppler (here) but cannot get it to compile. I am using R version 3.3.0 in a CentOS 6 environment without admin access.

like image 720
user5359531 Avatar asked Jul 14 '16 02:07

user5359531


People also ask

How do I count the number of pages in a PDF?

In Adobe Acrobat Pro, go to file > create PDF > merge files into a single PDF. Then add files and select the files you want. Click combine, and see how many pages are in the final PDF.

Can R read data from PDF?

PDE is a R package that easily extracts information and tables from PDF files.

How do I count the number of pages in a PDF using jquery?

You could also use pdf-lib . And then just get the number of pages of the attached file with: const numPages = await getNumPages(input. files[0]);

How do I count the number of pages in a PDF in PHP?

exec('/usr/bin/pdfinfo '. $tmpfname. ' | awk \'/Pages/ {print $2}\”, $output); Method 4: Using pdf as text file: Now, let us see the code in PHP which will tell us the number of pages in a pdf document.


2 Answers

The other suggestions and code seem unnecessarily opaque or complicated. Once pdftools is installed, the pdf_info command will return a pages field:

    library(pdftools)
    # returns number of pages
    # assumes your_file_name.pdf is in working directory
    pdf_info("your_file_name.pdf")$pages  
    
    # to see other available metadata in pdf_info object, use names()
    names(pdf_info("your_file_name.pdf")) 
like image 117
Omar Wasow Avatar answered Oct 23 '22 06:10

Omar Wasow


The script below worked for me.

#########################################
#GET PDF PAGE NUMBER :: R - JULY 16
##########################################

##SOURCE
#----pdftools package
#https://cran.rstudio.com/web/packages/pdftools

#Requirement
#brew install poppler
## TO AVOID ERROR ::: configure: error: cannot determine poppler-glib compile/link flags

#INSTALL PACKAGES
#install.packages("pdftools", dependencies=TRUE)  #only once

#IN/OUT FILES
in_put_pdf="pathTo/test.pdf"
out_put_pdf="pathTo/testCopy.pdf"

#LOAD LIBS
library(pdftools)

#Copy of the original file
file.copy(file.path(Sys.getenv("PATH_TO_PDF_FILE"), in_put_pdf), out_put_pdf)

#Many informations about the file are displayed here
info <- pdf_info(out_put_pdf)
text <- pdf_text(out_put_pdf)
fonts <- pdf_fonts(out_put_pdf)
files <- pdf_attachments(out_put_pdf)


#To get the number of pages
numberOfPageInPdf = info[2]
numberOfPageInPdf

Hope that can help. Good luck.

like image 20
NajlaBioinfo Avatar answered Oct 23 '22 06:10

NajlaBioinfo