Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Preflight program for PDFs using PoDoFo or anything else open source? [closed]

I have to automate a preflight check on PDF documents. The preflight consists of:

  1. Detect the resolution of images in an existing document and change them to 300dpi if they are not already at that resolution.
  2. Detect the colorspace of images and if not in CMYK, then convert them to CMYK using color profiles.
  3. Detect whether or not fonts are embedded in an existing PDF document, and correct this problem by substituting fonts. (or drawing font outlines — I'm not sure about this part).

Just wondering if this can be done using PoDoFo or any other open source projects out there. Or if I really need to go order some propriety software between $2K to $6K. My hosting environment is on Linux and supports PHP, Perl, Python, Ruby, Java.

Any ideas?

like image 790
user961627 Avatar asked Sep 30 '12 12:09

user961627


People also ask

How to PreFlight PDF?

Open the PDF and choose Tools > Print Production > Preflight in the right pane.

What is PreFlight software?

PreFlight is an advanced No-Code/Low-Code testing tool that is highly preferred by Agile teams for preventing flaky tests and eventually speeding up development velocity. Try for Free Watch Video. 5,427. Users.


3 Answers

I'm not aware of any ready-made Open Source software which meets your requirements.

Only a part of it could be solved by writing your own shell script (or other program).

  1. Detect resolution of images.

    Run pdfimages -list some.pdf to output a list of images contained in the PDF as well as their dimensions... seemingly. But what is not obvious about it: these dimensions are the ones of the raw image (as embedded in the PDF). This could be 720x720 pixels. However, if rendered onto a 10x10 inch square of the page this image will be 72 DPI on the page. If rendered on a 1x1 inch square, it will be 720 DPI. Both types of 'rendering' inside a PDF can be made from the same embedded raw image, and it is the context of the current 'graphic state' which determines which is applied. So to determine the actual DPI of an image as it appears on the page requires some additional PDF parsing...

    In any case, you can tell Ghostscript to re-sample images to 300 dpi, and to use a 'threshold' for this. (Ghostscript will never "upsample" an image, only downsample these which do overshoot the threshold. Upsampling almost never makes sense -- it only blows up the file size with no return in terms of higher quality.)

  2. Convert colors to colorspace CMYK using ICC profiles.

    The most recent versions of Ghostscript can do that. See also the most recent Ghostscript documentation describing its support for ICC.

  3. Embed un-embedded fonts.

    Running (and evaluating the results of) pdffonts some.pdf will show you which fonts are not embedded.

    Ghostscript can embed un-embedded fonts.

So one Ghostscript command that would cover most of your requirements is this:

gs                                     \
  -o cmyk.pdf                          \
  -sDEVICE=pdfwrite                    \
  -sColorConversionStrategy=CMYK       \
  -sProcessColorModel=DeviceCMYK       \
  -sOutputICCProfile=/path/to/your.icc \
  -sColorImageDownsampleThreshold=2    \
  -sColorImageDownsampleType=Bicubic   \
  -sColorImageResolution=300           \
  -sGrayImageDownsampleThreshold=2     \
  -sGrayImageDownsampleType=Bicubic    \
  -sGrayImageResolution=300            \
  -sMonoImageDownsampleThreshold=2     \
  -sMonoImageDownsampleType=Bicubic    \
  -sMonoImageResolution=1200           \
  -dSubsetFonts=true                   \
  -dEmbedAllFonts=true                 \
  -sCannotEmbedFontPolicy=Error        \
  -c ".setpdfwrite<</NeverEmbed[ ]>> setdistillerparams" \
  -f some.pdf

This command would downsample all images with a resolution that's higher than the double wanted resolution (*ImageDownSampleThreshold=2). Also it would apply all these settings to any input file (unless some special PDF preflighting software which would apply selective 'fixups' based on the results of 'checks' for special properties).

Lastly, I cannot see what made think you'd have to spend $2k to $6k in case you'd have to resort to closed-source, commercial preflighting software. (My favorite in this field is the very powerful callas pdfToolbox6 (which even has a version that runs as CLI on Linux) -- its basic version costs 500 €.)

like image 58
Kurt Pfeifle Avatar answered Sep 18 '22 16:09

Kurt Pfeifle


My background is in printing, so please keep this in mind when reading my answer. The items you propose to do seem somewhat straight forward, but when you get into the nitty gritty of it, there's a lot of print-industry knowledge that goes into these operations.

Here's some quick feedback to your bullet points:

  1. You won't want to upsample an low res image to 300 dpi as it will decrease image quality (via re-interpolation) and increase files size.

  2. You need to be careful with color conversions. There may be certain builds of RGB which you'd want to convert to black only. Or what happens if someone supplies a file which is already cmyk and tagged with the incorrect profile.

  3. Font detection - very complicated to substitute fonts. If you don't have the exact same font as the originator, you could end up with text reflow problems. To own that font, you'll have to paid for a license. You also can't convert fonts to outlines without them being embedded.

My recommendation is to look at a commercial package for preflighting. These developers have invested years into developing their programs and are experts within the field of printing. The challenging part will be finding ones that are unix based in your price range. Most are designed for Windows or Mac. Callas has a linux cl version but not at the price listed. You'd need the server version.

What type of volume are you planning to run through it?

like image 43
Greg Firestone Avatar answered Sep 22 '22 16:09

Greg Firestone


Did you try Enfocus PitStop Pro? Contact their support department with your specific request. They have tons of PDF preflight examples and will be happy to help you out.

like image 25
Ramzi Avatar answered Sep 19 '22 16:09

Ramzi