Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I find the orientation of a PDF using PHP or a Linux script?

Tags:

linux

php

pdf

I need to scan an uploaded PDF to determine if the pages within are all portrait or if there are any landscape pages. Is there someway I can use PHP or a linux command to scan the PDF for these pages?

like image 536
Serenade X Avatar asked Jul 17 '12 19:07

Serenade X


People also ask

How do I know the orientation of a PDF?

Open the PDF that has the orientation you want to change and go to the “View” drop-down menu at the top of the screen. Hover your mouse over “Rotate View” from the options that appear. Adobe gives you the choice of rotating clockwise or counterclockwise in 90-degree increments.

How do you know a PDF is portrait or landscape?

It is regarded as 'landscape' if the width is greater than the height. It is regarded as 'portrait' if the height is greater than the width. It is undetermined if width and height have the same value.

How do I change the layout of a PDF to landscape?

Change part of a document to landscape Select the content that you want on a landscape page. Go to Layout, and open the Page Setup dialog box. Select Landscape, and in the Apply to box, choose Selected text.


2 Answers

(Updated answer -- scroll down...)

You can use either pdfinfo (part of either the poppler-utils or the xpdf-tools) or identify (part of the ImageMagick toolkit).

identify:

identify -format "%f  Page %s:   Width: %W -- Height: %H\n" T-VD7.PDF

Example output:

T-VD7.PDF  Page 0:  Width: 595 -- Height: 842
T-VD7.PDF  Page 1:  Width: 595 -- Height: 842
T-VD7.PDF  Page 2:  Width: 1191 -- Height: 842
[...]
T-VD7.PDF  Page 11:  Width: 595 -- Height: 421
T-VD7.PDF  Page 12:  Width: 595 -- Height: 842

Or a bit simpler:

identify -format "%s: %Wx%H\n" T-VD7.PDF

gives:

0:  595x842
1:  595x842
2:  1191x842
[...]
11:  595x421
12:  595x842

Note, how identify uses a zero-based page counting mechanism!

Pages are 'landscape' if their width is bigger than their height. They are neither-nor, if both are equal.

The advantage is that identify lets you tweak the output format quite easily and very extensively.

pdfinfo:

pdfinfo input.pdf | grep "Page.*size:"

Example output:

Page size:      595.276 x 841.89 pts (A4)

pdfinfo is definitely faster and also more precise than identify, if it comes to multi-page PDFs. (The 13-page PDF I tested this with took identify to 31 seconds to process, whereas pdfinfo needed less than half a second....)

Be warned: by default pdfinfo does report the size for the first page only. To get sizes for all pages (as you may know, there are PDFs which use mixed page sizes as well as mixed orientations), you have to modify the command:

pdfinfo -f 3 -l 13 input.pdf | grep "Page.*size:"

Output now:

Page    1 size: 595.276 x 841.89 pts (A4)
Page    2 size: 595.276 x 841.89 pts (A4)
Page    3 size: 1191 x 842 pts (A3)
[....]
Page   12 size: 595 x 421 pts (A5)
Page   13 size: 595.276 x 841.89 pts (A4)

This will print the sizes of page 3 (f irst to report) through page 13 (l ast to report).

Scripting it:

  pdfinfo \
    -f 1 \
    -l 1000 \
     Vergleich-VD7.PDF \
| grep "Page.* size:" \
| \
| while read Page _pageno size _width x _height rest; do 
  [ "$(echo "${_width} / 1"|bc)" -gt "$(echo "${_height} / 1"|bc)" ] \
     && echo "Page $_pageno is landscape..." \
    || echo "Page $_pageno is portrait..."  ; \
 done

(the bc-trick is required because the -gt comparison works for the shell only with integers. Dividing by 1 with bc will take round the possible real values to integers...)

Result:

Page 1 is portrait...
Page 2 is portrait...
Page 3 is landscape...
[...]
Page 12 is landscape...
Page 13 is portrait...

Update: Using the 'right' pdfinfo to discover page rotations...

My initial answer tooted the horn of pdfinfo. Serenade X says in a comment that his/her problem is to discover rotated pages.

Ok now, here is some additional info which is not yet known widely and therefor has not yet been really absorbed by all pdfinfo users...

As I mentioned, there are two different pdfinfo utilities around:

  1. the one which comes as part of the xpdf-utils package (on some platform also named xpdf-tools).
  2. the one which comes as part of the poppler-utils package (on some platforms also named poppler-tools, and sometimes it is not separated out as a packages but is part of the main poppler package).

Poppler's pdfinfo output

So here is a sample output from Poppler's pdfinfo command. The tested file is a 2-page PDF where the first page is in portrait A4 and the second page is in landscape A4 format:

kp@mbp:~$  pdfinfo -f 1 -l 2 a4portrait+landscape.pdf 
Producer:       GPL Ghostscript 9.05
CreationDate:   Thu Jul 26 14:23:31 2012
ModDate:        Thu Jul 26 14:23:31 2012
Tagged:         no
Form:           none
Pages:          2
Encrypted:      no
Page    1 size: 595 x 842 pts (A4)
Page    1 rot:  0
Page    2 size: 842 x 595 pts (A4)
Page    2 rot:  0
File size:      3100 bytes
Optimized:      no
PDF version:    1.4

Do you see the lines saying Page 1 rot: 0 and Page 2 rot: 0?

Do you notice the lines saying Page 1 size: 595 x 842 pts (A4) and Page 2 size: 842 x 595 pts (A4) and the differences between the two?

XPDF's pdfinfo output

Now let's compare this to the output of XPDF's pdfinfo:

kp@mbp:~$  xpdf-pdfinfo -f 1 -l 2 a4portrait+landscape.pdf 
Producer:       GPL Ghostscript 9.05
CreationDate:   Thu Jul 26 14:23:31 2012
ModDate:        Thu Jul 26 14:23:31 2012
Tagged:         no
Pages:          2
Encrypted:      no
Page    1 size: 595 x 842 pts (A4)
Page    2 size: 842 x 595 pts (A4)
File size:      3100 bytes
Optimized:      no
PDF version:    1.4

You may notice one more difference, if you look closely enough. I won't point my finger to it, and will keep my mouth shut for now... :-)

Poppler's pdfinfo correctly reports rotation of page 2

Next, I rotate the second page of the file by 90 degrees using pdftk (I don't have Adobe Acrobat around):

pdftk \
  a4portrait+landscape.pdf \
  cat 1 2E \
  output a4portrait+landscape---page2-landscaped-by-pdftk.pdf 

Now Poppler's pdfinfo reports this:

kp@mbp:~$  pdfinfo -f 1 -l 2 a4portrait+landscape---page2-landscaped-by-pdftk.pdf 
Creator:        pdftk 1.44 - www.pdftk.com
Producer:       itext-paulo-155 (itextpdf.sf.net-lowagie.com)
CreationDate:   Thu Jul 26 14:39:47 2012
ModDate:        Thu Jul 26 14:39:47 2012
Tagged:         no
Form:           none
Pages:          2
Encrypted:      no
Page    1 size: 595 x 842 pts (A4)
Page    1 rot:  0
Page    2 size: 842 x 595 pts (A4)
Page    2 rot:  90
File size:      1759 bytes
Optimized:      no
PDF version:    1.4

As you can see, the line Page 2 rot: 90 tells us what we are looking for. XPDF's pdfinfo would essentially report the same info about the changed file as it does about the original one. Of course, it would still correctly capture the changed Creator:, Producer: and *Date: infos, but it would miss the rotated page...

Also note this detail: page 2 originally was designed as a landscape page, which can be seen from the Page 2 size: 842 x 595 pts (A4) info part. However, it shows up in the current PDF as a portrait page, as can be seen by the Page 2 rot: 90 part.

Also note that there are 4 different values that could appear for the rotation info:

  • 0 (no rotation),
  • 90 (rotation to the East, or 90 degrees clockwise),
  • 180 (rotation to the South, tumbled page image, upside-down, or 180 degrees clockwise),
  • 270 (rotation to the West, or 90 degrees counter-clockwise, or 270 degrees clockwise).

Some Background Info

Popper (developed by The Poppler Developers) is a fork of XPDF (developed by Glyph & Cog LLC), that happened around 2005. (As one of their important reason for their forking the Poppler developer at the time gave: Glyph & Cog didn't always provide timely bugfixes for security-related problems...)

Anyway, the Poppler fork for a very long time kept the associated commandline utilities, their commandline parameters and syntax as well as the format of their output compatible to the original (XPDF/Glyph & Cog LLC) ones.

Existing Poppler tools gaining additional features over competing XPDF tools

However, more recently they started to add additional features. Out of the top of my head:

  • pdfinfo now also reports the rotation status of each page (starting with Poppler v0.19.0, released March 1st, 2012).
  • pdffonts now also reports the font encoding for each font (starting with Poppler v0.19.1, released March 15th, 2012).

Poppler tools getting more siblings

The Poppler tools also provide some extra commandline utilities which are not in the original XPDF package (some of which have been added only quite recently):

  • pdftocairo - utility for creating PNG, JPEG, PostScript, EPS, PDF, SVG (using Cairo)
  • pdfseparate - utility to extract PDF pages
  • pdfunite - utility to merge PDF files
  • pdfdetach - utility to list or extract embedded files from a PDF
  • pdftohtml - utility to convert HTML from PDF files
like image 120
Kurt Pfeifle Avatar answered Oct 20 '22 00:10

Kurt Pfeifle


identify which comes with ImageMagick will give you the width and height of a given PDF file (it also requires GhostScript be installed on the system).

$ identify -format "%g\n" FILENAME.PDF
1417x1106+0+0

Where 1417 is the width, 1106 is the height, and you (for this purpose) can ignore the +0+0.

Edit: Sorry, I was referring to Mike B's comment on the original question - as he said, after knowing the width and height you can determine if you have a portrait or landscape image (if height > width then portrait else landscape).

Also, the \n added to the -format argument (as suggested by Kurt Pfeifle) will separate each page into its own line. He also mentions the %W and %H format parameters; all the possible format parameters can be found here (there are a lot of them).

like image 25
WWW Avatar answered Oct 20 '22 02:10

WWW