Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Processing PDFs to reduce file size / and or complexity

I have PDF files I need to prepare for viewing on mobile devices. The worse case would be ~50 pages, with lots full color images and vector art, file size approx. 40MB. This is acceptable for PC viewing on broadband, but not great for mobile viewing due to long download times and very laggy scrolling on mobile (At least on my overclocked Droid). Are there any tools or libraries for processing the files to simply the vector stuff, downsample/recompress the images, that sort of thing?

Output in pdf format is not absolutely essential, but it needs to be something readable on android and iOS devices without software downloads.

like image 796
Tyler Eaves Avatar asked Dec 31 '10 19:12

Tyler Eaves


2 Answers

There are a few main things that can blow up the size of a PDF on mobile devices:

  • hi-resolution pictures (where lo-res would suffice)
  • embedded fonts (where content would still be readable "good enough" without them)
  • PDF content not required any more for the current version/view (older version of certain objects)
  • embedded ICC profiles
  • embedded third-party files (using the PDF as a container)
  • embedded job tickets (for printing)
  • embedded Javascript
  • and a few more

FOSS software: Ghostscript can try to size down your PDFs, mainy be re-sampling the pictures used and by removing older versions ("generations") of PDF objects which were replaced by newer ones:

gswin32c.exe ^
  -o sized-down.pdf ^
  -sDEVICE=pdfwrite ^
  -dPDFSETTINGS=/ebook ^
  -dEmbedAllFonts=false ^
  -c ".setpdfwrite <</AlwaysEmbed [ ]>>" ^
  -f blown-up.pdf

You can add more parameters to above commandline to size down certain PDFs even more (f.e. by setting a lower max resolution, etc.) Here is an example to enforce a downsampling for color and grayscale images to 72dpi:

gswin32c.exe ^
  -o sized-down.pdf ^
  -sDEVICE=pdfwrite ^
  -dPDFSETTINGS=/ebook ^
  -dEmbedAllFonts=false ^
  -dColorImageDownsampleThreshold=1.0 ^
  -dColorImageDownsampleType=/Average ^
  -dColorImageResolution=72 ^
  -dGrayImageDownsampleThreshold=1.0 ^
  -dGrayImageDownsampleType=/Average ^
  -dGrayImageResolution=72 ^
  -c ".setpdfwrite <</AlwaysEmbed [ ]>>" ^
  -f blown-up.pdf

Commercial+closed source software: callas pdfToolbox4 is able to reduce file sizes even more by applying a custom profile to the PDF downsizing process (it can even un-embed fonts and ICC profiles).


Update 2: See also the following (new) question with the answer:

  • How can I remove all images from a PDF?

It provides some sample PostScript code which completely removes all (raster) images from the PDF, leaving the rest of the page layout unchanged. This is useful in cases where you do not want the (raster) images, but only the text parts in order to reduce file size.

like image 97
Kurt Pfeifle Avatar answered Sep 29 '22 23:09

Kurt Pfeifle


Adobe Acrobat Professional has two built-in tools for optimizing PDF files:

"PDF Optimizer" - http://www.adobe.com/designcenter/acrobat/articles/acr7optimize.html, which will simplify vectors and removed unneeded content (among other things)

and

"Optimize Scanned PDF" -http://help.adobe.com/en_US/Acrobat/9.0/Standard/WS58a04a822e3e50102bd615109794195ff-7f71.w.html#WS0BEFAC0B-47D9-47b8-9AF8-4DE2FE9C9736.w, which will downsample and compress embedded raster images.

Both are the best tools for what they do that I have used. However, the focus of most PDF optimization tools is to reduce file size not improve rendering speed.

If you want to drastically improve rendering performance on your device you should consider pre-rendering the PDFs to bitmap images. If you scale them up a bit before rasterizing (to allow for on-device zooming) and stick to an indexed color scheme you should be able to produce rasters for each page that are an acceptable file size and resolution. They will draw much more quickly on the device than vector content would.

like image 40
Josh Knauer Avatar answered Sep 29 '22 23:09

Josh Knauer