Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compressing/Optimizing Vectors in PDF

I have a PDF of scanned book, the images are in JBIG2 format (B&W). I'd like to convert this to a vector PDF, which I can do easily by extracting the images and converting them to PDF vector graphics instructions with potrace.

The reason for this is that I want the PDF to display smoothly and quickly on an ebook reader device, such as a Kindle. With JBIG2 it is not doing this very well. Depending on the settings, the Kindle can't display the PDF, and even with that fixed it takes a long time to render each page. With a vector PDF the performance is much better, and the rendering very crisp.

The problem is that the resulting PDF is gigantic in filesize. Even with the streams gzcompressed to the max it is 300KB per page (original JBIG2 images were 30KB per page).

Is there any way I can optimize the vector graphics so that the filesize is much less?

Here is an segment of the vector drawing instructions:

0.100000 0.000000 0.000000 0.100000 0.000000 0.000000 cm
0 g
8277 29404 m
8263 29390 8270 29370 8289 29370 c
8335 29370 8340 29361 8340 29284 c
8340 29220 8338 29210 8323 29210 c
8194 29207 8141 29208 8132 29214 c
8125 29218 8120 29248 8120 29289 c
8120 29356 8121 29358 8150 29370 c
8201 29391 8184 29400 8095 29400 c
8004 29400 7986 29388 8033 29357 c
8056 29342 8057 29338 8057 29180 c
8058 29018 l
8029 29008 l
8012 29002 8001 28993 8003 28986 c
h
f

I would have thought that the numbers could be compressed down very easily, but apparently not. One page is 800KB uncompressed (as above) and 300KB gzcompressed. I would have thought that the compression ratio could be much better, considering how the instructions are all numbers in similar ranges.

like image 690
Alasdair Avatar asked Nov 13 '22 12:11

Alasdair


1 Answers

I am afraid there's not much that can be done about this.

Of course, you might try to use LZW compression on PDF page streams (instead of Deflate) but it probably won't make much difference.

Another suggestions:

  • Smooth source image as much as possible / remove as many details as possible. This might render less curves (i.e. less data) during conversion.
  • Try to optimize values in PDF page stream. For example, you might try to use sophisticated combinations of scale / translate operators and changes to data. The goal here is to reduce length of operands.

For example, you might try to divide all operands (using integer, not floating-point division) by, say, 100 and add scaling before first operator. This approach most probably degrade the visual quality, though.

And of course, if you are going to do this to only a handful of files then I would say it's not worth the time.

like image 186
Bobrovsky Avatar answered Mar 07 '23 12:03

Bobrovsky