Get the number of pages in a PDF document

This question is for referencing and comparing. The solution is the accepted answer below.

Many hours have I searched for a fast and easy, but mostly accurate, way to get the number of pages in a PDF document. Since I work for a graphic printing and reproduction company that works a lot with PDFs, the number of pages in a document must be precisely known before they are processed. PDF documents come from many different clients, so they aren't generated with the same application and/or don't use the same compression method.

Here are some of the answers I found insufficient or simply NOT working:

Using Imagick (a PHP extension)

Imagick requires a lot of installation, apache needs to restart, and when I finally had it working, it took amazingly long to process (2-3 minutes per document) and it always returned 1 page in every document (haven't seen a working copy of Imagick so far), so I threw it away. That was with both the getNumberImages() and identifyImage() methods.

Using FPDI (a PHP library)

FPDI is easy to use and install (just extract files and call a PHP script), BUT many of the compression techniques are not supported by FPDI. It then returns an error:

FPDF error: This document (test_1.pdf) probably uses a compression technique which is not supported by the free parser shipped with FPDI.

Opening a stream and search with a regular expression:

This opens the PDF file in a stream and searches for some kind of string, containing the pagecount or something similar.

$f = "test1.pdf"; $stream = fopen($f, "r"); $content = fread ($stream, filesize($f));  if(!$stream || !$content)     return 0;  $count = 0; // Regular Expressions found by Googling (all linked to SO answers): $regex  = "/\/Count\s+(\d+)/"; $regex2 = "/\/Page\W*(\d+)/"; $regex3 = "/\/N\s+(\d+)/";  if(preg_match_all($regex, $content, $matches))     $count = max($matches);  return $count;

/\/Count\s+(\d+)/ (looks for /Count <number>) doesn't work because only a few documents have the parameter /Count inside, so most of the time it doesn't return anything. Source.
/\/Page\W*(\d+)/ (looks for /Page<number>) doesn't get the number of pages, mostly contains some other data. Source.
/\/N\s+(\d+)/ (looks for /N <number>) doesn't work either, as the documents can contain multiple values of /N ; most, if not all, not containing the pagecount. Source.

So, what does work reliable and accurate?

See the answer below

834

asked Feb 01 '13 10:02

Richard de Wit

Video Answer

1 Answers

A simple command line executable called: pdfinfo.

It is downloadable for Linux and Windows. You download a compressed file containing several little PDF-related programs. Extract it somewhere.

One of those files is pdfinfo (or pdfinfo.exe for Windows). An example of data returned by running it on a PDF document:

Title:          test1.pdf Author:         John Smith Creator:        PScript5.dll Version 5.2.2 Producer:       Acrobat Distiller 9.2.0 (Windows) CreationDate:   01/09/13 19:46:57 ModDate:        01/09/13 19:46:57 Tagged:         yes Form:           none Pages:          13    <-- This is what we need Encrypted:      no Page size:      2384 x 3370 pts (A0) File size:      17569259 bytes Optimized:      yes PDF version:    1.6

I haven't seen a PDF document where it returned a false pagecount (yet). It is also really fast, even with big documents of 200+ MB the response time is a just a few seconds or less.

There is an easy way of extracting the pagecount from the output, here in PHP:

// Make a function for convenience  function getPDFPages($document) {     $cmd = "/path/to/pdfinfo";           // Linux     $cmd = "C:\\path\\to\\pdfinfo.exe";  // Windows          // Parse entire output     // Surround with double quotes if file name has spaces     exec("$cmd \"$document\"", $output);      // Iterate through lines     $pagecount = 0;     foreach($output as $op)     {         // Extract the number         if(preg_match("/Pages:\s*(\d+)/i", $op, $matches) === 1)         {             $pagecount = intval($matches[1]);             break;         }     }          return $pagecount; }  // Use the function echo getPDFPages("test 1.pdf");  // Output: 13

Of course this command line tool can be used in other languages that can parse output from an external program, but I use it in PHP.

I know its not pure PHP, but external programs are way better in PDF handling (as seen in the question).

I hope this can help people, because I have spent a whole lot of time trying to find the solution to this and I have seen a lot of questions about PDF pagecount in which I didn't find the answer I was looking for. That's why I made this question and answered it myself.

Security Notice: Use escapeshellarg on $document if document name is being fed from user input or file uploads.

192

answered Oct 21 '22 11:10

Richard de Wit

Related questions
                            
                                PHP - Copy image to my server direct from URL [duplicate]
                            
                                PHP Warning: Unknown: failed to open stream
                            
                                How can I display the users profile pic using the facebook graph api?
                            
                                Laravel throws 'The bootstrap/cache directory must be present and writable' error after update
                            
                                MVC for advanced PHP developers [closed]
                            
                                Increasing the maximum post size
                            
                                Symfony2: how to get all entities of one type which are marked with "EDIT" ACL permission?
                            
                                Formulas to Calculate Geo Proximity
                            
                                PHP variable interpolation vs concatenation [duplicate]
                            
                                Is foreach guaranteed to iterate in the array order in php?
                            
                                php.ini: which one?
                            
                                How to get all class names inside a particular namespace?
                            
                                How do I make a PATCH request in PHP using cURL?
                            
                                What is the maximum length of a String in PHP?
                            
                                Why doesn't PHP permit private const?
                            
                                How to get a platform independent directory separator in PHP?
                            
                                Execute a string of PHP code on the command line
                            
                                array_unique and then renumbering keys [duplicate]
                            
                                Getting filename (or deleting file) using file handle
                            
                                Does PHP have built-in data structures?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Get the number of pages in a PDF document

Tags:

php

pdf

This question is for referencing and comparing. The solution is the accepted answer below.

Using Imagick (a PHP extension)

Using FPDI (a PHP library)

Opening a stream and search with a regular expression:

So, what does work reliable and accurate?

Richard de Wit

People also ask

Video Answer

1 Answers

A simple command line executable called: pdfinfo.

Richard de Wit

Recent Activity

Donate For Us