Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PDF bleed detection

I'm currently writing a little tool (Python + pyPdf) to test PDFs for printer conformity.

Alas I already get confused at the first task: Detecting if the PDF has at least 3mm 'bleed' (border around the pages where nothing is printed). I already got that I can't detect the bleed for the complete document, since there doesn't seem to be a global one. On the pages however I can detect a total of five different boxes:

  • mediaBox
  • bleedBox
  • trimBox
  • cropBox
  • artBox

I read the pyPdf documentation concerning those boxes, but the only one I understood is the mediaBox which seems to represent the overall page size (i.e. the paper).

The bleedBox pretty obviously ought to define the bleed, but that doesn't always seem to be the case.

Another thing I noted was that for instance with the PDF, all those boxes have the exact same size (implying no bleed at all) on each page, but when I open it there's a huge amount of bleed; This leads me to think that the individual text elements have their own offset.

So, obviously, just calculating the bleed from mediaBox and bleedBox is not a viable option.

I would be more than delighted if anyone could shed some light on what those boxes actually are and what I can conclude from that (e.g. is one box always smaller than another one).

Bonus question: Can someone tell me what exactly the "default user space unit" mentioned in the documentation? I'm pretty sure this refers to mm on my machine, but I'd like to enforce mm everywhere.

like image 582
phryk Avatar asked Nov 05 '12 16:11

phryk


1 Answers

Quoting from the PDF specification ISO 32000-1:2008 as published by Adobe:

14.11.2 Page Boundaries

14.11.2.1 General

A PDF page may be prepared either for a finished medium, such as a sheet of paper, or as part of a prepress process in which the content of the page is placed on an intermediate medium, such as film or an imposed reproduction plate. In the latter case, it is important to distinguish between the intermediate page and the finished page. The intermediate page may often include additional production-related content, such as bleeds or printer marks, that falls outside the boundaries of the finished page. To handle such cases, a PDF page maydefine as many as five separate boundaries to control various aspects of the imaging process:

  • The media box defines the boundaries of the physical medium on which the page is to be printed. It may include any extended area surrounding the finished page for bleed, printing marks, or other such purposes. It may also include areas close to the edges of the medium that cannot be marked because of physical limitations of the output device. Content falling outside this boundary may safely be discarded without affecting the meaning of the PDF file.

  • The crop box defines the region to which the contents of the page shall be clipped (cropped) when displayed or printed. Unlike the other boxes, the crop box has no defined meaning in terms of physical page geometry or intended use; it merely imposes clipping on the page contents. However, in the absence of additional information (such as imposition instructions specified in a JDF or PJTF job ticket), the crop box determines how the page’s contents shall be positioned on the output medium. The default value is the page’s media box.

  • The bleed box (PDF 1.3) defines the region to which the contents of the page shall be clipped when output in a production environment. This may include any extra bleed area needed to accommodate the physical limitations of cutting, folding, and trimming equipment. The actual printed page may include printing marks that fall outside the bleed box. The default value is the page’s crop box.

  • The trim box (PDF 1.3) defines the intended dimensions of the finished page after trimming. It may be smaller than the media box to allow for production-related content, such as printing instructions, cut marks, or colour bars. The default value is the page’s crop box.

  • The art box (PDF 1.3) defines the extent of the page’s meaningful content (including potential white space) as intended by the page’s creator. The default value is the page’s crop box.

The page object dictionary specifies these boundaries in the MediaBox, CropBox, BleedBox, TrimBox, and ArtBox entries, respectively (see Table 30). All of them are rectangles expressed in default user space units. The crop, bleed, trim, and art boxes shall not ordinarily extend beyond the boundaries of the media box. If they do, they are effectively reduced to their intersection with the media box. Figure 86 illustrates the relationships among these boundaries. (The crop box is not shown in the figure because it has no defined relationship with any of the other boundaries.)

Following that there is a nice graphic showing those boxes in relation to each other:

PDF boxes illustrated

The reasons why in many cases only the media box is set, are

  1. that in case of PDFs meant for electronic consumption (i.e. reading on a computer) the other boxes hardly matter; and

  2. that even in the prepress context they aren't as necessary anymore as they used to be, cf. the article Pedro refers to in his comment.

Concerning your "bonus question": The user space unit is 1⁄72 inch by default; since PDF 1.6 it can be changed, though, to any (not necessary integer) multiple of that size using the UserUnit entry in the page dictionary. Changing it in an existing PDF essentially scales it as the user space unit is the basic unit in the device independent coordinate system of a page. Therefore, unless you want to update each and every command in the page descriptions refering to coordinates to keep the page dimensions, you won't want to enforce a millimeter user space unit... ;)

like image 115
mkl Avatar answered Oct 01 '22 15:10

mkl