Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ghostscript to merge PDFs compresses the result

I found this neat command to merge multiple PDF into one, using Ghostscript:

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=out.pdf in1.pdf in2.pdf 

The resulting size is smaller than the combined size of the 2 PDFs.

Running the command with a single file as input still results to a smaller size output file.

Is there an option on Ghostscript to just copy the pages as they appear on merging without doing any compression?

If not, is it possible that the Ghostscript compression is so good that it will result in absolutely no loss in quality?

like image 575
Dimitris Baltas Avatar asked Nov 16 '11 20:11

Dimitris Baltas


2 Answers

Here's some additional options that you can pass when using pdfwrite as your device. According to that page if you don't pass anything then -dPDFSETTINGS it gets set to something close to /screen, although it doesn't get more specific. You could try setting it to -dPDFSETTINGS=/prepress which should only compress things above 300 dpi.

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sOutputFile=out.pdf in1.pdf in2.pdf 

Another alternative is pdftk:

pdftk in1.pdf in2.pdf cat output out.pdf 
like image 98
Chris Haas Avatar answered Sep 24 '22 07:09

Chris Haas


Some of the size optimizations that you observed may come from Ghostscript's cleaning up of unused objects, its recently acquired font optimization improvements (do you use a very recent version of GS?!?) and possibly image re-/down-sampling that may have happened.

Ghostscript, if used for PDF -> PDF conversions, basically operates like this:

  1. Read in the input file(s) with all its objects and convert them into its internal format for graphical page representations.
  2. Do the manipulations asked for on the commandline to the page contents in the internal format.
  3. Write out a completely new PDF.

This means that for most PDF -> PDF operations you'll have different ordering and numbering for the PDF objects, and even the object's internal code may have changed (even if your eyes don't discover any differences between input and output PDF).

By default Ghostscript also will compress any object streams that have been uncompressed in the original file (but this is a lossless compression).

Now for your very simplistic commandline which does not contain any wishes for manipulations, Ghostscript assumes you want to use -dPDFSETTINGS=/default, sets this parameter implicitly and operates accordingly.

Now what are the /default PDFSETTINGS?! You have two options to find out:

  1. Read the manual. The large table in middle of this section gives an overview. You can see that this one -dPDFSETTINGS=/default in itself is just a shorthand for the several dozen other more specific settings which it represents. The link to the documentation given is for current HEAD of the development code and your actually used version may be different of course.

  2. Query (your own) Ghostscript for the detailed meaning of this setting. My answers to question 'Querying Ghostscript for the default options/settings of an output device...' and question 'What are PostScript dictionaries, and how can they be accessed (via Ghostscript)?' do elaborate a bit more on this. In short, to query Ghostscript for the details of its /default PDFSETTINGS, run this command:

     gs \    -q \    -dNODISPLAY \    -c ".distillersettings /default get {exch ==only ( ) print ===} forall quit" 

    You should get a result very similar to this:

      /Optimize false   /DoThumbnails false   /PreserveEPSInfo true   /ColorConversionStrategy /LeaveColorUnchanged   /DownsampleMonoImages false   /EmbedAllFonts true   /CannotEmbedFontPolicy /Warning   /PreserveOPIComments true   /GrayACSImageDict << /HSamples [2 1 1 2] /VSamples [2 1 1 2] /QFactor 0.9 /Blend 1 >>   /DownsampleColorImages false   /PreserveOverprintSettings true   /CreateJobTicket false   /AutoRotatePages /PageByPage   /NeverEmbed [/Courier /Courier-Bold /Courier-Oblique /Courier-BoldOblique /Helvetica /Helvetica-Bold /Helvetica-Oblique /Helvetica-BoldOblique /Times-Roman /Times-Bold /Times-Italic /Times-BoldItalic /Symbol /ZapfDingbats]   /ColorACSImageDict << /HSamples [2 1 1 2] /VSamples [2 1 1 2] /QFactor 0.9 /Blend 1 >>   /DownsampleGrayImages false   /UCRandBGInfo /Preserve 

    The only point that stands out from these: you may want to change /AutoRotagePages from /PageByPage to /None. On the commandline you would put it as -dAutoRotatePages=/None.

    To give you a complete list of parameters which would specifically tell Ghostscript to employ as much of a passthrough mode as it possibly can to the input PDF by adding these parameters:

      -dAntiAliasColorImage=false \   -dAntiAliasGrayImage=false \   -dAntiAliasMonoImage=false \   -dAutoFilterColorImages=false \   -dAutoFilterGrayImages=false \   -dDownsampleColorImages=false \   -dDownsampleGrayImages=false \   -dDownsampleMonoImages=false \   -dColorConversionStrategy=/LeaveColorUnchanged \   -dConvertCMYKImagesToRGB=false \   -dConvertImagesToIndexed=false \   -dUCRandBGInfo=/Preserve \   -dPreserveHalftoneInfo=true \   -dPreserveOPIComments=true \   -dPreserveOverprintSettings=true \ 

So you could try this command:

gs                                              \  -o output.pdf                                  \  -sDEVICE=pdfwrite                              \  -dAntiAliasColorImage=false                    \  -dAntiAliasGrayImage=false                     \  -dAntiAliasMonoImage=false                     \  -dAutoFilterColorImages=false                  \  -dAutoFilterGrayImages=false                   \  -dDownsampleColorImages=false                  \  -dDownsampleGrayImages=false                   \  -dDownsampleMonoImages=false                   \  -dColorConversionStrategy=/LeaveColorUnchanged \  -dConvertCMYKImagesToRGB=false                 \  -dConvertImagesToIndexed=false                 \  -dUCRandBGInfo=/Preserve                       \  -dPreserveHalftoneInfo=true                    \  -dPreserveOPIComments=true                     \  -dPreserveOverprintSettings=true               \   input1.pdf                                    \   input2.pdf 

Finally, as Chris Haas already hinted at: you can also use pdftk if you specifically do not want any of the optimizations that Ghostscript applies by default. pdftk is simply unable to do such things, and you'll gain quite a bit of speed for its relative dumbness of operation (but probably also much larger file size outputs than from Ghostscript).

like image 32
Kurt Pfeifle Avatar answered Sep 25 '22 07:09

Kurt Pfeifle