I need to scan an uploaded PDF to determine if the pages within are all portrait or if there are any landscape pages. Is there someway I can use PHP or a linux command to scan the PDF for these pages?
Open the PDF that has the orientation you want to change and go to the “View” drop-down menu at the top of the screen. Hover your mouse over “Rotate View” from the options that appear. Adobe gives you the choice of rotating clockwise or counterclockwise in 90-degree increments.
It is regarded as 'landscape' if the width is greater than the height. It is regarded as 'portrait' if the height is greater than the width. It is undetermined if width and height have the same value.
Change part of a document to landscape Select the content that you want on a landscape page. Go to Layout, and open the Page Setup dialog box. Select Landscape, and in the Apply to box, choose Selected text.
You can use either pdfinfo
(part of either the poppler-utils or the xpdf-tools) or identify
(part of the ImageMagick toolkit).
identify -format "%f Page %s: Width: %W -- Height: %H\n" T-VD7.PDF
Example output:
T-VD7.PDF Page 0: Width: 595 -- Height: 842
T-VD7.PDF Page 1: Width: 595 -- Height: 842
T-VD7.PDF Page 2: Width: 1191 -- Height: 842
[...]
T-VD7.PDF Page 11: Width: 595 -- Height: 421
T-VD7.PDF Page 12: Width: 595 -- Height: 842
Or a bit simpler:
identify -format "%s: %Wx%H\n" T-VD7.PDF
gives:
0: 595x842
1: 595x842
2: 1191x842
[...]
11: 595x421
12: 595x842
Note, how identify
uses a zero-based page counting mechanism!
Pages are 'landscape' if their width is bigger than their height. They are neither-nor, if both are equal.
The advantage is that identify
lets you tweak the output format quite easily and very extensively.
pdfinfo input.pdf | grep "Page.*size:"
Example output:
Page size: 595.276 x 841.89 pts (A4)
pdfinfo
is definitely faster and also more precise than identify
, if it comes to multi-page PDFs. (The 13-page PDF I tested this with took identify
to 31 seconds to process, whereas pdfinfo
needed less than half a second....)
Be warned: by default pdfinfo
does report the size for the first page only. To get sizes for all pages (as you may know, there are PDFs which use mixed page sizes as well as mixed orientations), you have to modify the command:
pdfinfo -f 3 -l 13 input.pdf | grep "Page.*size:"
Output now:
Page 1 size: 595.276 x 841.89 pts (A4)
Page 2 size: 595.276 x 841.89 pts (A4)
Page 3 size: 1191 x 842 pts (A3)
[....]
Page 12 size: 595 x 421 pts (A5)
Page 13 size: 595.276 x 841.89 pts (A4)
This will print the sizes of page 3 (f irst to report) through page 13 (l ast to report).
pdfinfo \
-f 1 \
-l 1000 \
Vergleich-VD7.PDF \
| grep "Page.* size:" \
| \
| while read Page _pageno size _width x _height rest; do
[ "$(echo "${_width} / 1"|bc)" -gt "$(echo "${_height} / 1"|bc)" ] \
&& echo "Page $_pageno is landscape..." \
|| echo "Page $_pageno is portrait..." ; \
done
(the bc
-trick is required because the -gt
comparison works for the shell only with integers. Dividing by 1
with bc
will take round the possible real values to integers...)
Result:
Page 1 is portrait...
Page 2 is portrait...
Page 3 is landscape...
[...]
Page 12 is landscape...
Page 13 is portrait...
pdfinfo
to discover page rotations...My initial answer tooted the horn of pdfinfo
. Serenade X says in a comment that his/her problem is to discover rotated pages.
Ok now, here is some additional info which is not yet known widely and therefor has not yet been really absorbed by all pdfinfo
users...
As I mentioned, there are two different pdfinfo
utilities around:
xpdf-utils
package (on some platform also named xpdf-tools
).poppler-utils
package (on some platforms also named poppler-tools
, and sometimes it is not separated out as a packages but is part of the main poppler
package).pdfinfo
outputSo here is a sample output from Poppler's pdfinfo
command. The tested file is a 2-page PDF where the first page is in portrait A4 and the second page is in landscape A4 format:
kp@mbp:~$ pdfinfo -f 1 -l 2 a4portrait+landscape.pdf Producer: GPL Ghostscript 9.05 CreationDate: Thu Jul 26 14:23:31 2012 ModDate: Thu Jul 26 14:23:31 2012 Tagged: no Form: none Pages: 2 Encrypted: no Page 1 size: 595 x 842 pts (A4) Page 1 rot: 0 Page 2 size: 842 x 595 pts (A4) Page 2 rot: 0 File size: 3100 bytes Optimized: no PDF version: 1.4
Do you see the lines saying Page 1 rot: 0
and Page 2 rot: 0
?
Do you notice the lines saying Page 1 size: 595 x 842 pts (A4)
and Page 2 size: 842 x 595 pts (A4)
and the differences between the two?
pdfinfo
outputNow let's compare this to the output of XPDF's pdfinfo
:
kp@mbp:~$ xpdf-pdfinfo -f 1 -l 2 a4portrait+landscape.pdf Producer: GPL Ghostscript 9.05 CreationDate: Thu Jul 26 14:23:31 2012 ModDate: Thu Jul 26 14:23:31 2012 Tagged: no Pages: 2 Encrypted: no Page 1 size: 595 x 842 pts (A4) Page 2 size: 842 x 595 pts (A4) File size: 3100 bytes Optimized: no PDF version: 1.4
You may notice one more difference, if you look closely enough. I won't point my finger to it, and will keep my mouth shut for now... :-)
pdfinfo
correctly reports rotation of page 2Next, I rotate the second page of the file by 90 degrees using pdftk
(I don't have Adobe Acrobat around):
pdftk \
a4portrait+landscape.pdf \
cat 1 2E \
output a4portrait+landscape---page2-landscaped-by-pdftk.pdf
Now Poppler's pdfinfo
reports this:
kp@mbp:~$ pdfinfo -f 1 -l 2 a4portrait+landscape---page2-landscaped-by-pdftk.pdf Creator: pdftk 1.44 - www.pdftk.com Producer: itext-paulo-155 (itextpdf.sf.net-lowagie.com) CreationDate: Thu Jul 26 14:39:47 2012 ModDate: Thu Jul 26 14:39:47 2012 Tagged: no Form: none Pages: 2 Encrypted: no Page 1 size: 595 x 842 pts (A4) Page 1 rot: 0 Page 2 size: 842 x 595 pts (A4) Page 2 rot: 90 File size: 1759 bytes Optimized: no PDF version: 1.4
As you can see, the line Page 2 rot: 90
tells us what we are looking for. XPDF's pdfinfo
would essentially report the same info about the changed file as it does about the original one. Of course, it would still correctly capture the changed Creator:
, Producer:
and *Date:
infos, but it would miss the rotated page...
Also note this detail: page 2 originally was designed as a landscape page, which can be seen from the Page 2 size: 842 x 595 pts (A4)
info part. However, it shows up in the current PDF as a portrait page, as can be seen by the Page 2 rot: 90
part.
Also note that there are 4 different values that could appear for the rotation info:
0
(no rotation),90
(rotation to the East, or 90 degrees clockwise),180
(rotation to the South, tumbled page image, upside-down, or 180 degrees clockwise),270
(rotation to the West, or 90 degrees counter-clockwise, or 270 degrees clockwise).Popper (developed by The Poppler Developers) is a fork of XPDF (developed by Glyph & Cog LLC), that happened around 2005. (As one of their important reason for their forking the Poppler developer at the time gave: Glyph & Cog didn't always provide timely bugfixes for security-related problems...)
Anyway, the Poppler fork for a very long time kept the associated commandline utilities, their commandline parameters and syntax as well as the format of their output compatible to the original (XPDF/Glyph & Cog LLC) ones.
However, more recently they started to add additional features. Out of the top of my head:
pdfinfo
now also reports the rotation status of each page (starting with Poppler v0.19.0, released March 1st, 2012).pdffonts
now also reports the font encoding for each font (starting with Poppler v0.19.1, released March 15th, 2012).The Poppler tools also provide some extra commandline utilities which are not in the original XPDF package (some of which have been added only quite recently):
pdftocairo
- utility for creating PNG, JPEG, PostScript, EPS, PDF, SVG (using Cairo)pdfseparate
- utility to extract PDF pagespdfunite
- utility to merge PDF filespdfdetach
- utility to list or extract embedded files from a PDFpdftohtml
- utility to convert HTML from PDF filesidentify
which comes with ImageMagick will give you the width and height of a given PDF file (it also requires GhostScript be installed on the system).
$ identify -format "%g\n" FILENAME.PDF
1417x1106+0+0
Where 1417
is the width, 1106
is the height, and you (for this purpose) can ignore the +0+0
.
Edit: Sorry, I was referring to Mike B's comment on the original question - as he said, after knowing the width and height you can determine if you have a portrait or landscape image (if height > width then portrait else landscape).
Also, the \n
added to the -format
argument (as suggested by Kurt Pfeifle) will separate each page into its own line. He also mentions the %W
and %H
format parameters; all the possible format parameters can be found here (there are a lot of them).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With