I am trying to archive TIFF images in a database, and I would like to compress the images as much as possible, even at the cost of higher CPU usage and high memory.
In order to test the compressions available in LibTiff.NET, I used the following code (modified from this sample):
//getImageRasterBytes and convertSamples are defined in the sample
void Main() {
foreach (Compression cmp in Enum.GetValues(typeof(Compression))) {
try {
using (Bitmap bmp = new Bitmap(@"D:\tifftest\200 COLOR.tif")) {
using (Tiff tif = Tiff.Open($@"D:\tifftest\output_{cmp}.tif", "w")) {
byte[] raster = utils.getImageRasterBytes(bmp, PixelFormat.Format24bppRgb);
tif.SetField(TiffTag.IMAGEWIDTH, bmp.Width);
tif.SetField(TiffTag.IMAGELENGTH, bmp.Height);
tif.SetField(TiffTag.COMPRESSION, cmp);
tif.SetField(TiffTag.PHOTOMETRIC, Photometric.RGB);
tif.SetField(TiffTag.ROWSPERSTRIP, bmp.Height);
tif.SetField(TiffTag.XRESOLUTION, bmp.HorizontalResolution);
tif.SetField(TiffTag.YRESOLUTION, bmp.VerticalResolution);
tif.SetField(TiffTag.BITSPERSAMPLE, 8);
tif.SetField(TiffTag.SAMPLESPERPIXEL, 3);
tif.SetField(TiffTag.PLANARCONFIG, PlanarConfig.CONTIG);
int stride = raster.Length / bmp.Height;
utils.convertSamples(raster, bmp.Width, bmp.Height);
for (int i = 0, offset = 0; i < bmp.Height; i++) {
tif.WriteScanline(raster, offset, i, 0);
offset += stride;
}
}
}
} catch (Exception ex) {
//code was run in LINQPad
ex.Dump(cmp.ToString());
}
}
}
The test image is 200dpi 24bpp, 1700 width by 2200 height, and using LZW compression; the file size is nearly 7 MB. (The image is representative of the images I want to store.)
Of the algorithms that did work (some failed with various errors), the smallest compressed file was created using Compression.Deflate
, but that only compressed to 5MB, and I would like it significantly smaller (under 1 MB).
There must be some algorithm for higher compression; a PDF file containing this image is something like 500Kb.
If a specific algorithm is incompatible with other TIFF viewers/libraries, this is not an issue, as long as we can extract the compressed TIFF from the database and convert it to a System.Drawing.Bitmap
using LibTiff.Net or some other library.
How can I generate even smaller files with lossless compression? Is this even possible with these kinds of images?
Update
PDF file
TIFF file
Just to give some numbers on the example image (the tiff one). All compressions are lossless and can recreate any other lossless format like bmp/png (which has been checked).
tiff-orig 5.779.814
png (unoptimized) 3.084.641 53.37%
png (optimized) 2.795.230 48.36%
png (zopfli) 2.791.680 48.30%
jpeg2000 2.230.967 38.60%
webp 2.021.710 34.98% BSD
gralic 1.795.457 31.06%
flif 1.778.976 30.78% LGPL3
As noise is the most important factor killing lossless-compression potentials, let's remove some. We are doing this with this python-based code, but there are many more possible approaches. The following code uses a nonlinear-filter which tries to remove noise while keeping important edges.
Of course information is lost here, but i actually like the denoised image a bit more as it's nicer to read (in my opinion).
from skimage.io import imread, imsave
from skimage.restoration import denoise_bilateral
img = imread("200 DPI.tif")
img_denoised = denoise_bilateral(img, multichannel=True, sigma_range=0.05, sigma_spatial=15)
imsave("200 DPI_denoised.png", img_denoised)
flif (denoised) 1.140.497 19.73%
Two parts to the answer:
Make it lossy in a way you choose, rather than the way a lossy codec does it. For example, if you are working with scanned text images, do brightness/contrast normalization (possibly local normalization) so the page background is pure white. This will improve compressibility by a lot; it could make a 10MB grayscale text page with almost but not exactly white background into a 200kB page with pure white background and grayscale text (using LZW)
Use JPEG2000. If you want best possible lossless compression, JPEG2000 with lossless settings will likely beat any other algorithm such as PNG, especially for content like photos, but also for scanned pages. Storing your JPEG2000 inside TIFF containers should also be possible, but it is not a very common feature of TIFF libraries; you may or may not want to do that. I think JPEG2000 has a feature for multiple images in one file also.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With