Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scale and reduce colors to reduce file size of scan

I need to reduce the file size of a color scan.

Up to now I think the following steps should be made:

  • selective blur (or similar) to reduce noise
  • scale to ~120dpi
  • reduce colors

Up to now we use convert (imagemagick) and net-ppm tools.

The scans are invoices, not photos.

Any hints appreciated.

Update

example:

  • http://www.thomas-guettler.de/tbz/example.png 11M
  • http://www.thomas-guettler.de/tbz/example_0800_pnmdepth009.png pnmscale, pnmdepth 110K
  • http://www.thomas-guettler.de/tbz/example_1000_pnmdepth006.png pnmscale, pnmdepth 116K

Bounty

The smallest and good readable reduced file of example.png with a reproduce-able solution gets the bounty. The solution needs to use open source software only.

The file format is not important, as long as you can convert it to PNG again. Processing time is not important. I can optimize later.

Update

I got very good results for black-and-white output (thank you). Color reducing to about 16 or 32 colors would be interesting.

like image 299
guettli Avatar asked Jan 25 '12 08:01

guettli


1 Answers

This is a rather open ended question since there's still possible room for flex between image quality and image size... after all, making it black and white and compressing it with CCITT T.6 black and white (fax-style) compression is going to beat the pants off most if not all color-capable compression algorithms.

If you're willing to go black and white (not grayscale), do that! It makes documents very small.

Otherwise I recommend a series of minor image transformations and Adaptive Prediction Trees (see here). The APT software package is opensource or public domain and very easy to compile and use. Its advantages are that it performs well on a wide variety of image types, especially text, and it will allow you to scale image size vs. image quality better without losing readability. (I found myself squishing a example_1000-sized color version down to 48KB on the threshold of readability, and 64K with obvious artifacts but easy readability.)

I combined APT with imagemagick tweakery:

convert example.png -resize 50% -selective-blur 0x4+10% -brightness-contrast -5x30 -resize 80% example.ppm
./capt example.ppm example.apt 20  # The 20 means quality in the range [0,100]

And to reverse the process

./dapt example.apt out_example.ppm
convert out_example.ppm out_example.png

To explain the imagemagick settings:

  • -resize 50% Make it half as small to make processing faster. Also hides some print and scan artifacts.
  • -selective-blur 0x4+10%: Sharpening actually creates more noise. What you actually want is a selective blur (like in Photoshop) which blurs when there's no "edge".
  • -brightness-contrast -5x30: Here we increase the contrast a good bit to clip the bad coloration caused by the page outline (leading to less compressible data). We also darken slightly to make the blacks blacker.
  • -resize 80% Finally, we resize to a little bigger than your example_1000 image size. (Close enough.) This also reduces the number of obvious artifacts since they're somewhat hidden when the pixels are merged together.

At this point you're going to have a fine looking image in this example -- nice, smooth colors and crisp text. Then we compress. The quality value of 20 is a pretty low setting and it's not as spiffy looking anymore, but the document is very legible. Even at a quality value of 0 it's still mostly legible.

Again, using ADT isn't going to necessarily lead to the best results for this image, but it won't turn into an entirely unrecognizable mess on photographic-like content such as gradients, so you should be covered better on more types or unexpected types of documents.

Results: 88kb 76kb 64kb 48kb

Processed image before compression

like image 70
Kaganar Avatar answered Sep 29 '22 07:09

Kaganar