Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

More effective lossless compression for TIFF

Tags:

libtiff.net

I am trying to archive TIFF images in a database, and I would like to compress the images as much as possible, even at the cost of higher CPU usage and high memory.

In order to test the compressions available in LibTiff.NET, I used the following code (modified from this sample):

//getImageRasterBytes and convertSamples are defined in the sample
void Main() {
    foreach (Compression cmp in Enum.GetValues(typeof(Compression))) {
        try {
            using (Bitmap bmp = new Bitmap(@"D:\tifftest\200 COLOR.tif")) {
                using (Tiff tif = Tiff.Open($@"D:\tifftest\output_{cmp}.tif", "w")) {
                    byte[] raster = utils.getImageRasterBytes(bmp, PixelFormat.Format24bppRgb);
                    tif.SetField(TiffTag.IMAGEWIDTH, bmp.Width);
                    tif.SetField(TiffTag.IMAGELENGTH, bmp.Height);
                    tif.SetField(TiffTag.COMPRESSION, cmp);
                    tif.SetField(TiffTag.PHOTOMETRIC, Photometric.RGB);

                    tif.SetField(TiffTag.ROWSPERSTRIP, bmp.Height);

                    tif.SetField(TiffTag.XRESOLUTION, bmp.HorizontalResolution);
                    tif.SetField(TiffTag.YRESOLUTION, bmp.VerticalResolution);

                    tif.SetField(TiffTag.BITSPERSAMPLE, 8);
                    tif.SetField(TiffTag.SAMPLESPERPIXEL, 3);

                    tif.SetField(TiffTag.PLANARCONFIG, PlanarConfig.CONTIG);

                    int stride = raster.Length / bmp.Height;
                    utils.convertSamples(raster, bmp.Width, bmp.Height);

                    for (int i = 0, offset = 0; i < bmp.Height; i++) {
                        tif.WriteScanline(raster, offset, i, 0);
                        offset += stride;
                    }
                }
            }
        } catch (Exception ex) {
            //code was run in LINQPad
            ex.Dump(cmp.ToString());
        }
    }
}

The test image is 200dpi 24bpp, 1700 width by 2200 height, and using LZW compression; the file size is nearly 7 MB. (The image is representative of the images I want to store.)

Of the algorithms that did work (some failed with various errors), the smallest compressed file was created using Compression.Deflate, but that only compressed to 5MB, and I would like it significantly smaller (under 1 MB).

There must be some algorithm for higher compression; a PDF file containing this image is something like 500Kb.

If a specific algorithm is incompatible with other TIFF viewers/libraries, this is not an issue, as long as we can extract the compressed TIFF from the database and convert it to a System.Drawing.Bitmap using LibTiff.Net or some other library.

How can I generate even smaller files with lossless compression? Is this even possible with these kinds of images?

Update

PDF file
TIFF file

like image 270
Zev Spitz Avatar asked Sep 26 '16 20:09

Zev Spitz


2 Answers

Simple evaluation of the test-image

Just to give some numbers on the example image (the tiff one). All compressions are lossless and can recreate any other lossless format like bmp/png (which has been checked).

tiff-orig         5.779.814  
png (unoptimized) 3.084.641  53.37%
png (optimized)   2.795.230  48.36%  
png (zopfli)      2.791.680  48.30%
jpeg2000          2.230.967  38.60%
webp              2.021.710  34.98%  BSD
gralic            1.795.457  31.06%  
flif              1.778.976  30.78%  LGPL3

Remarks

  • These are just the results of one image
    • Most of these still have potential gains, but a huge amount of time is needed for compression then
    • While the general observation (in regards to ordering of the compression-efficiency of these compressors) should hold, the values will change for a bigger testset
  • Most of these compressors are created to handle single-images only
    • It would be an easy task to split the multi-tiff to single ones; compress each; store the connections somehow
    • This is also very natural within a DB-setup
    • If these multi-tiff images are strongly correlated, it might be possible to use this (e.g. general-purpose compressors; or a custom-approach)
  • As i indicated in the comments, the kind of reduction you wanted is not possible for most types of images (e.g. photos or scans; sticking to lossless compression)
    • There is much to be told, but the most important aspect is: They contain a lot of noise and noise can't be compressed

For fun: denoise + lossless-compression

As noise is the most important factor killing lossless-compression potentials, let's remove some. We are doing this with this python-based code, but there are many more possible approaches. The following code uses a nonlinear-filter which tries to remove noise while keeping important edges.

Of course information is lost here, but i actually like the denoised image a bit more as it's nicer to read (in my opinion).

Code for denoising

from skimage.io import imread, imsave
from skimage.restoration import denoise_bilateral

img = imread("200 DPI.tif")
img_denoised = denoise_bilateral(img, multichannel=True, sigma_range=0.05, sigma_spatial=15)
imsave("200 DPI_denoised.png", img_denoised)

Evaluation

flif (denoised) 1.140.497  19.73%

enter image description here

like image 86
sascha Avatar answered Nov 09 '22 16:11

sascha


Two parts to the answer:

  • Make it lossy in a way you choose, rather than the way a lossy codec does it. For example, if you are working with scanned text images, do brightness/contrast normalization (possibly local normalization) so the page background is pure white. This will improve compressibility by a lot; it could make a 10MB grayscale text page with almost but not exactly white background into a 200kB page with pure white background and grayscale text (using LZW)

  • Use JPEG2000. If you want best possible lossless compression, JPEG2000 with lossless settings will likely beat any other algorithm such as PNG, especially for content like photos, but also for scanned pages. Storing your JPEG2000 inside TIFF containers should also be possible, but it is not a very common feature of TIFF libraries; you may or may not want to do that. I think JPEG2000 has a feature for multiple images in one file also.

like image 42
Alex I Avatar answered Nov 09 '22 15:11

Alex I