I have a PDF document that also contains images.
Now I want to know the resolution of these images.
A first step would be to somehow get the images out of the PDF document. But how?
Is that even possible with something provided in Cocoa?
To find out an image's DPI in Windows, right-click on the file name and select Properties > Details. You'll see the DPI in the Image section, labeled Horizontal Resolution and Vertical Resolution. On a Mac, you need to open the image in Preview and select Tools > Adjust Size. It's labeled Resolution.
PDF files do not have a single "DPI" value. Every bitmap page object has a separate resolution, and of course vector objects such as text have no resolution at all.
PDF files do not have a single DPI value, all bitmap page objects have separate resolutions, therefore, course vector objects such as text have no resolution at all. However, this flexibility allows PDF documents to be printed and displayed with the highest quality even for super large sizes.
Have a look at this answer for your other question:
Basically, you can now use the (new) -list
parameter for Poppler's pdfimages
commandline utility (it will NOT work for XPDF's version of pdfimages
!).
It will report the dimensions of each image appearing on the queried pages.
(You can also use it to extract images from a PDF: pdfimages -png -f 3 -l 5 some.pdf prefix---
will extract all images as PNGs from the PDF file, starting with first page 3 and ending with last page 5, using a filename prefix of prefix---
for each image. But this problem seems to not be the main focus of your question...)
pdfimages -list -f 1 -l 3 /Users/kurtpfeifle/Downloads/ct-magazin-14-2012.pdf page num type width height color comp bpc enc interp object ID --------------------------------------------------------------------- 1 0 image 1247 1738 rgb 3 8 jpx no 3053 0 2 1 image 582 839 gray 1 8 jpeg no 2080 0 2 2 image 344 364 gray 1 8 jpx no 2079 0 3 3 image 581 838 rgb 3 8 jpeg no 7 0 3 4 image 1088 776 rgb 3 8 jpx no 8 0 3 5 image 6 6 rgb 3 8 image no 9 0 3 6 image 8 6 rgb 3 8 image no 10 0 3 7 image 4 6 rgb 3 8 image no 11 0 3 8 image 212 106 rgb 3 8 jpx no 12 0 3 9 image 150 68 rgb 3 8 jpx no 13 0 3 10 image 6 6 rgb 3 8 image no 14 0 3 11 image 4 4 rgb 3 8 image no 15 0
It does not directly report the DPI resolution -- but from the 'width' and 'height' dimensions you can calculate it easily: you measure the width of the picture on your screen with an inch ruler and then divide the 'width pixels' by the measured ruler number...
You find this strange, because the result is dependent on your current zoom level? Yes, it is!
The concept of the 'resolution' is always dependent on the environment. A so-called 'hi-res' picture basically always has lots of pixels in width and height. This allows for better quality (or 'resolution') if the picture needs to be displayed or printed with higher zoom levels.
Meanwhile there is a new version of (Poppler's) pdfimages
:
$ pdfimages -version
pdfimages version 0.33.0
[....]
This reports the resolution of embedded images as well, in PPI (pixels per inch), in horizontal (x-ppi
) and vertical (y-ppi
) directions:
page num type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio
-------------------------------------------------------------------------------------
1 0 image 1247 1738 rgb 3 8 jpx no 3053 0 151 151 228K 3.6%
2 1 image 582 839 gray 1 8 jpeg no 2080 0 72 72 319B 0.1%
2 2 image 344 364 gray 1 8 jpx no 2079 0 150 150 4325B 3.5%
3 3 image 581 838 rgb 3 8 jpeg no 7 0 73 73 1980B 0.1%
3 4 image 1088 776 rgb 3 8 jpx no 8 0 150 151 106K 4.3%
3 5 image 6 6 rgb 3 8 image no 9 0 150 150 108B 100%
3 6 image 8 6 rgb 3 8 image no 10 0 150 150 158B 110%
3 7 image 4 6 rgb 3 8 image no 11 0 150 150 73B 101%
3 8 image 212 106 rgb 3 8 jpx no 12 0 150 150 2396B 3.6%
3 9 image 150 68 rgb 3 8 jpx no 13 0 150 150 1878B 6.1%
3 10 image 6 6 rgb 3 8 image no 14 0 150 150 81B 75%
3 11 image 4 4 rgb 3 8 image no 15 0 150 150 50B 104%
This new feature appeared first in Poppler version 0.25 (released Wed December 11, 2013). It additionally reports...
...of embedded images.
pdfimages -list
Perhaps I should also make you aware of the limitations of the pdfimages
utility, and give an example where its output report is not completely correct.
One example is this handcoded PDF from my (recently created) GitHub repository of PDFs to help beginners to study the syntax of PDF source code.
I originally created this PDF in order to demonstrate a bug with Mozilla's PDF.js renderer. Here is a screenshot about how it looks in PDF.js (left) and how it should look when rendered correctly (right, rendered by Ghostscript and Adobe Reader):
(Right-click on each of above images. Select "Open image in new tab" to see the exact differences...")
The PDF file contains a 2x2 pixels image, embedded only once (with object ID 5 0
), but displayed on the page multiple times with different settings, where each time the image is placed...
Under these extreme circumstances pdfimages -list
falls flat on its nose when trying to determine some of the resolutions for instances of this image:
page num type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio
------------------------------------------------------------------------------------
1 0 image 2 2 rgb 3 8 image no 5 0 4 4 13B 108%
1 1 image 2 2 rgb 3 8 image no 5 0 5 3 13B 108%
1 2 image 2 2 rgb 3 8 image no 5 0 3 5 13B 108%
1 3 image 2 2 rgb 3 8 image no 5 0 6 3 13B 108%
1 4 image 2 2 rgb 3 8 image no 5 0 3 10 13B 108%
1 5 image 2 2 rgb 3 8 image no 5 0 4 72000 13B 108%
1 6 image 2 2 rgb 3 8 image no 5 0 4 2 13B 108%
1 7 image 2 2 rgb 3 8 image no 5 0 2 4 13B 108%
1 8 image 2 2 rgb 3 8 image no 5 0 14401 1 13B 108%
1 9 image 2 2 rgb 3 8 image no 5 0 1 2 13B 108%
1 10 image 2 2 rgb 3 8 image no 5 0 0.950 4 13B 108%
1 11 image 2 2 rgb 3 8 image no 5 0 4 0.950 13B 108%
1 12 image 2 2 rgb 3 8 image no 5 0 0.950 4 13B 108%
1 13 image 2 2 rgb 3 8 image no 5 0 1 4 13B 108%
1 14 image 2 2 rgb 3 8 image no 5 0 0.950 4 13B 108%
1 15 image 2 2 rgb 3 8 image no 5 0 0.950 4 13B 108%
1 16 image 2 2 rgb 3 8 image no 5 0 4 0.950 13B 108%
pdfimages -list
gets most values correct, if there is no rotation and/or no skewing involved. It is no wonder that there are discrepancies if the image is rotated or skewed: Because how would you even reliably define an x-ppi
and y-ppi
value for such cases? That explains the (completely wrong) values of 72000 y-ppi
for image no. 5 and 14401 x-ppi
for image no. 8.
As you can easily see, pdfimages
is rather clever for determining other image properties:
5 0
for all instances of the displayed image, indicating that this image is embedded once, but displayed multiple times on the page.2x2
pixels.It's not easy, but it's possible. While you cannot do it using PDFDocument
, you can instead use the CGPDF*
stuff in Quartz. Briefly: you will need to use CGPDFPageGetDictionary()
to get the dictionary for the page the image is on, then get the information about its XObject (assuming it's not inlined in the stream) from the dictionary. Even this is not straightforward -- you will need to consult with the PDF standard to understand how the XObject may be formatted and then use the various CG*
routines to drill down to what you need.
I should add that the default DPI ("user unit") for a PDF document is 72. Also, many images in PDFs are created with vector graphics so they don't really have a default DPI.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With