Many hours have I searched for a fast and easy, but mostly accurate, way to get the number of pages in a PDF document. Since I work for a graphic printing and reproduction company that works a lot with PDFs, the number of pages in a document must be precisely known before they are processed. PDF documents come from many different clients, so they aren't generated with the same application and/or don't use the same compression method.
Here are some of the answers I found insufficient or simply NOT working:
Imagick requires a lot of installation, apache needs to restart, and when I finally had it working, it took amazingly long to process (2-3 minutes per document) and it always returned 1
page in every document (haven't seen a working copy of Imagick so far), so I threw it away. That was with both the getNumberImages()
and identifyImage()
methods.
FPDI is easy to use and install (just extract files and call a PHP script), BUT many of the compression techniques are not supported by FPDI. It then returns an error:
FPDF error: This document (test_1.pdf) probably uses a compression technique which is not supported by the free parser shipped with FPDI.
This opens the PDF file in a stream and searches for some kind of string, containing the pagecount or something similar.
$f = "test1.pdf"; $stream = fopen($f, "r"); $content = fread ($stream, filesize($f)); if(!$stream || !$content) return 0; $count = 0; // Regular Expressions found by Googling (all linked to SO answers): $regex = "/\/Count\s+(\d+)/"; $regex2 = "/\/Page\W*(\d+)/"; $regex3 = "/\/N\s+(\d+)/"; if(preg_match_all($regex, $content, $matches)) $count = max($matches); return $count;
/\/Count\s+(\d+)/
(looks for /Count <number>
) doesn't work because only a few documents have the parameter /Count
inside, so most of the time it doesn't return anything. Source. /\/Page\W*(\d+)/
(looks for /Page<number>
) doesn't get the number of pages, mostly contains some other data. Source. /\/N\s+(\d+)/
(looks for /N <number>
) doesn't work either, as the documents can contain multiple values of /N
; most, if not all, not containing the pagecount. Source. So, what does work reliable and accurate?
See the answer below
In Adobe Acrobat Pro, go to file > create PDF > merge files into a single PDF. Then add files and select the files you want. Click combine, and see how many pages are in the final PDF.
It does not use any PHP library for performing this task. The following line of code can be used. $pageCount = (new TCPDI())->setSourceData((string)file_get_contents($fileName)); Method 3: Using pdfinfo: For Linux users, there is a faster way to count the number of pages in a pdf document than “identity” function.
It is downloadable for Linux and Windows. You download a compressed file containing several little PDF-related programs. Extract it somewhere.
One of those files is pdfinfo (or pdfinfo.exe for Windows). An example of data returned by running it on a PDF document:
Title: test1.pdf Author: John Smith Creator: PScript5.dll Version 5.2.2 Producer: Acrobat Distiller 9.2.0 (Windows) CreationDate: 01/09/13 19:46:57 ModDate: 01/09/13 19:46:57 Tagged: yes Form: none Pages: 13 <-- This is what we need Encrypted: no Page size: 2384 x 3370 pts (A0) File size: 17569259 bytes Optimized: yes PDF version: 1.6
I haven't seen a PDF document where it returned a false pagecount (yet). It is also really fast, even with big documents of 200+ MB the response time is a just a few seconds or less.
There is an easy way of extracting the pagecount from the output, here in PHP:
// Make a function for convenience function getPDFPages($document) { $cmd = "/path/to/pdfinfo"; // Linux $cmd = "C:\\path\\to\\pdfinfo.exe"; // Windows // Parse entire output // Surround with double quotes if file name has spaces exec("$cmd \"$document\"", $output); // Iterate through lines $pagecount = 0; foreach($output as $op) { // Extract the number if(preg_match("/Pages:\s*(\d+)/i", $op, $matches) === 1) { $pagecount = intval($matches[1]); break; } } return $pagecount; } // Use the function echo getPDFPages("test 1.pdf"); // Output: 13
Of course this command line tool can be used in other languages that can parse output from an external program, but I use it in PHP.
I know its not pure PHP, but external programs are way better in PDF handling (as seen in the question).
I hope this can help people, because I have spent a whole lot of time trying to find the solution to this and I have seen a lot of questions about PDF pagecount in which I didn't find the answer I was looking for. That's why I made this question and answered it myself.
Security Notice: Use escapeshellarg
on $document
if document name is being fed from user input or file uploads.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With