Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reducing PDF file size using Ghostscript on Linux didn't work

I have about 50-60 pdf files (images) that are 1.5MB large each. Now I don't want to have such large pdf files in my thesis as that would make downloading, reading and printing a pain in the rear. So I tried using ghostscript to do the following:

gs \
  -dNOPAUSE -dBATCH \
  -sDEVICE=pdfwrite \
  -dCompatibilityLevel=1.4 \
  -dPDFSETTINGS="/screen" \
  -sOutputFile=output.pdf \
    L_2lambda_max_1wl_E0_1_zg.pdf

However, now my 1.4MB pdf is 1.5MB large.

What did I do wrong? Is there some way I can check the resolution of the pdf file? I just need 300dpi images, so would anyone suggest using convert to change the resolution or is there someway I could change the image resolution (reduce it) with gs, since the image is very grainy when I use convert

How I use convert:

 convert \
     -units PixelsPerInch \
      ~/Desktop/L_2lambda_max_1wl_E0_1_zg.pdf \
     -density 600 \
      ~/Desktop/output.pdf

Example File

http://dl.dropbox.com/u/13223318/L_2lambda_max_1wl_E0_1_zg.pdf

like image 989
dearN Avatar asked Aug 07 '12 17:08

dearN


1 Answers

DNA decided to go for grayscale PNGs. The way he's creating them is in two steps:

  1. Step 1: Convert a color PDF page (such as this) to a grayscale PDF page, using Ghostscript's pdfwrite device and the settings
    -dColorConversionStrategy=/Gray and
    -dProcessColorModel=/DeviceGray.
  2. Step 2: Convert the grayscale PDF page to a PNG, using Ghostscript's pngalpha device at a resolution of 300 dpi (-r300 on the GS commandline).

This reduces his initial file size of 1.4 MB to 0.7 MB.

But this workflow has the following disadvantage:

  • It looses all color info, without saving much disk space as compared to a color output written at the same resolution, directly from the PDF!

There are 2 alternatives to DNA's workflow:

  1. A one-step conversion of (color) PDF -> (color) PNG, using Ghostscript's pngalpha device with the original PDF as input (same settings of 300 dpi resolution). This would have this advantage:

    • It would keep the color information in the PNG output, requiring only a little more space on disk!
  2. A one-step conversion of (color) PDF -> grayscale PNG, using Ghostscript's pnggray device with the original PDF as input (same settings of 300 dpi resolution), with this mix of advantage/disadvantage :

    • It would loose the color information in the PNG output.
    • It would loose the transparent background that was preserved in DNA's workflow.
    • It would save lots of disk space, because the filesize would go down to about 20% of the output from DNA's workflow.

So you can make up your mind and see the output sizes and quality side-by-side, here is a shell script to demonstrate the differences:

#!/bin/bash
#
# Copywrite (c) 2012 <[email protected]>
# License: Creative Commons (CC BY-SA 3.0) 

function echo_do() {
        echo
        echo "Command:     ${*}"
        echo "--------"
        echo
        "${@}"
}

[ -d out ] || mkdir out

echo 
echo "    We assume all PDF pages are 1-page PDFs!"
echo "    (otherwise we'd have to include something like '%03d'"
echo "    into the output filenames in order to get paged output)"
echo

echo '
 # Convert Color PDF to Grayscale PDF.
 # If PDF has transparent background (most do), 
 # this will remain transparent in output.)
 # ATTENTION: since we don't use a resolution,
 # pdfwrite will use its default value of '-r720'.
 # (However, this setting will only affect raster objects...)
'
for i in *.pdf
do
echo_do gs \
 -o "out/${i}---pdfwrite-devicegray-gs.pdf" \
 -sDEVICE=pdfwrite \
 -dColorConversionStrategy=/Gray \
 -dProcessColorModel=/DeviceGray \
 -dCompatibilityLevel=1.4 \
  "${i}"
done

echo '
 # Convert (previously generated) grayscale PDF to PNG using Alpha channel
 # (Alpha channel can make backgrounds transparent)
'
for i in out/*pdfwrite-devicegray*.pdf
do
echo_do gs \
 -o "out/$(basename "${i}")---pngalpha-from-pdfwrite-devicegray-gs.png" \
 -sDEVICE=pngalpha \
 -r300 \
  "${i}"
done

echo '
 # Convert (color) PDF to grayscale PNG using Alpha channel 
 # (Alpha channel can make backgrounds transparent)
'
for i in *.pdf
do
# Following only required for 'pdfwrite' output device, not for 'pngalpha'!
#                -dProcessColorModel=/DeviceGray 
echo_do gs \
 -o "out/${i}---pngalphagray_gs.png" \
 -sDEVICE=pngalpha \
 -dColorConversionStrategy=/Gray \
 -r300 \
  "${i}"
done

echo '
 # Convert (color) PDF to (color) PNG using Alpha channel
 # (Alpha channel can make backgrounds transparent)
'
for i in *.pdf
do
echo_do gs \
 -o "out/${i}---pngalphacolor_gs.png" \
 -sDEVICE=pngalpha \
 -r300 \
  "${i}"
done

echo '
 # Convert (color) PDF to grayscale PNG 
 # (no Alpha channel here, therefor [mostly] white backgrounds)
'
for i in *.pdf
do
echo_do gs \
 -o "out/${i}---pnggray_gs.png" \
 -sDEVICE=pnggray  \
 -r300 \
  "${i}"
done

echo " All output to be found in ./out/ ..."
echo

Run this script and compare the different outputs side by side.

Yes, the 'direct-grayscale-PNG-from-color-PDF-using-pnggray-device' one may look a bit worse (and it doesn't sport the transparent background) than the other one -- but it is also only 20% of its file size. On the other hand, if you wan to buy a bit more quality by sacrificing a bit of disk space -- you could use -r400 instead of -r300...

like image 51
Kurt Pfeifle Avatar answered Sep 20 '22 05:09

Kurt Pfeifle