Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JPEG compression differences in C# and Python

I am moving some image processing functionality from .NET to Python under the constraint that the output images must be compressed in the exact same way as they were in .NET. However, when I compare the .jpg output files on a tool like text-compare and choose Ignore nothing, there are significant differences in how the files were compressed.

For example:

Python

bmp = PIL.Image.open('marbles.bmp')

bmp.save(
    'output_python.jpg',
    format='jpeg',
    dpi=(300,300),
    subsampling=2,
    quality=75
)

.NET

ImageCodecInfo jgpEncoder = ImageCodecInfo.GetImageDecoders().First(codec => codec.FormatID == ImageFormat.Jpeg.Guid);
EncoderParameters myEncoderParameters = new EncoderParameters(1);
myEncoderParameters.Param[0] = new EncoderParameter(Encoder.Quality, 75L);

Bitmap bmp = new Bitmap(directory + "marbles.bmp");

bmp.Save(directory + "output_net.jpg", jgpEncoder, myEncoderParameters);

exiftool output_python.jpg -a -G1 -w txt

[ExifTool]      ExifTool Version Number         : 12.31
[System]        File Name                       : output_python.jpg
[System]        Directory                       : .
[System]        File Size                       : 148 KiB
[System]        File Modification Date/Time     : 2021:09:28 09:19:20-06:00
[System]        File Access Date/Time           : 2021:09:28 09:19:21-06:00
[System]        File Creation Date/Time         : 2021:09:27 21:33:35-06:00
[System]        File Permissions                : -rw-rw-rw-
[File]          File Type                       : JPEG
[File]          File Type Extension             : jpg
[File]          MIME Type                       : image/jpeg
[File]          Image Width                     : 1419
[File]          Image Height                    : 1001
[File]          Encoding Process                : Baseline DCT, Huffman coding
[File]          Bits Per Sample                 : 8
[File]          Color Components                : 3
[File]          Y Cb Cr Sub Sampling            : YCbCr4:2:0 (2 2)
[JFIF]          JFIF Version                    : 1.01
[JFIF]          Resolution Unit                 : inches
[JFIF]          X Resolution                    : 300
[JFIF]          Y Resolution                    : 300
[Composite]     Image Size                      : 1419x1001
[Composite]     Megapixels                      : 1.4

exiftool output_net.jpg -a -G1 -w txt

[ExifTool]      ExifTool Version Number         : 12.31
[System]        File Name                       : output_net.jpg
[System]        Directory                       : .
[System]        File Size                       : 147 KiB
[System]        File Modification Date/Time     : 2021:09:28 09:18:05-06:00
[System]        File Access Date/Time           : 2021:09:28 09:18:52-06:00
[System]        File Creation Date/Time         : 2021:09:27 21:32:19-06:00
[System]        File Permissions                : -rw-rw-rw-
[File]          File Type                       : JPEG
[File]          File Type Extension             : jpg
[File]          MIME Type                       : image/jpeg
[File]          Image Width                     : 1419
[File]          Image Height                    : 1001
[File]          Encoding Process                : Baseline DCT, Huffman coding
[File]          Bits Per Sample                 : 8
[File]          Color Components                : 3
[File]          Y Cb Cr Sub Sampling            : YCbCr4:2:0 (2 2)
[JFIF]          JFIF Version                    : 1.01
[JFIF]          Resolution Unit                 : inches
[JFIF]          X Resolution                    : 300
[JFIF]          Y Resolution                    : 300
[Composite]     Image Size                      : 1419x1001
[Composite]     Megapixels                      : 1.4

marbles.bmp sample image

Difference on text-compare

Difference on text-compare

Marbles difference details

Questions

  • Is it reasonable to assume that these two implementations of JPEG compression could yield identical output files?
  • If so, are either PIL or System.Drawing.Image doing any extra steps like anti-aliasing that are making the results different?
  • Or are there additional parameters to PIL .save() to make it behave more like the JPEG encoder in C#?

Thanks

Update

Based on Jeremy's recommendation, I used JPEGsnoop to compare more details between the files and found that the Luminance and Chrominance tables were different. I modified the code:

bmp = PIL.Image.open('marbles.bmp')

output_net = PIL.Image.open('output_net.jpg')

bmp.save(
    'output_python.jpg',
    format='jpeg',
    dpi=(300,300),
    subsampling=2,
    qtables=output_net.quantization,
    #quality=75
)

Now the tables are the same, but the difference between the files is unchanged. The only differences JPEGsnoop shows now are in the Compression stats and Huffman code histogram stats.

output_net.jpeg

*** Decoding SCAN Data ***
  OFFSET: 0x0000026F
  Scan Decode Mode: Full IDCT (AC + DC)

  Scan Data encountered marker   0xFFD9 @ 0x00024BE7.0

  Compression stats:
    Compression Ratio: 28.43:1
    Bits per pixel:     0.84:1

  Huffman code histogram stats:
    Huffman Table: (Dest ID: 0, Class: DC)
      # codes of length 01 bits:        0 (  0%)
      # codes of length 02 bits:     1664 (  7%)
      # codes of length 03 bits:    18238 ( 81%)
      # codes of length 04 bits:     1807 (  8%)
      # codes of length 05 bits:      715 (  3%)
      # codes of length 06 bits:        4 (  0%)
      # codes of length 07 bits:        0 (  0%)
      ...

output_python.jpg

*** Decoding SCAN Data ***
  OFFSET: 0x0000026F
  Scan Decode Mode: Full IDCT (AC + DC)

  Scan Data encountered marker   0xFFD9 @ 0x00025158.0

  Compression stats:
    Compression Ratio: 28.17:1
    Bits per pixel:     0.85:1

  Huffman code histogram stats:
    Huffman Table: (Dest ID: 0, Class: DC)
      # codes of length 01 bits:        0 (  0%)
      # codes of length 02 bits:     1659 (  7%)
      # codes of length 03 bits:    18247 ( 81%)
      # codes of length 04 bits:     1807 (  8%)
      # codes of length 05 bits:      711 (  3%)
      # codes of length 06 bits:        4 (  0%)
      # codes of length 07 bits:        0 (  0%)
      ...

I am now looking for a way to sync these values through PIL.

like image 367
J. Mac Avatar asked Sep 28 '21 16:09

J. Mac


People also ask

What is JPEG image compression standard?

JPEG is an image compression standard that was developed by the “Joint Photographic Experts Group”. JPEG was for- mally accepted as an international standard in 1992. • JPEG is a lossy image compression method. It employs a transform coding method using the DCT (Discrete Cosine Transform).

What is the best compression ratio for JPEG images?

The best compression ratio to retain image quality is 10:1. If you're looking to reduce the file size of your photographs while keeping image quality, this is the maximum compression ratio you want to shoot for.

What are the three phases of JPEG compression?

DCT (Discrete Cosine Transformation) Quantization. Zigzag Scan.

What is the best compression factor for JPEG images?

Over the years, I've standardized on a JPEG compression factor of 15; I find that generally provides the best compromise between image quality and file size for most photographic images. Although I've done some ad-hoc testing that pointed to compression factor 15 as the sweet spot before, I've never done a formal test.

What is the difference between image compression and data compression?

Data compression is "Lossless" compression - wherein you cannot afford loosing/mistaking even a single bit. Whereas.. Image compression is a "Lossy" compression - wherein you can afford to loose/mistake a certain amount of detail depending on you requirement of data-size over quality (data-reproduction). i.e.

Is it possible to compress a JPEG file without losing data?

Not really, because this is lossy compression, whereas ZIP is lossless. Every JPEG recompression loses more and more of the original data. But evidently the JPEG algorithm is fairly tolerant of its own artifacts, eg, it was designed to allow recompression of previous JPEG images without losing too much additional data.

Why is a PNG image larger than a JPEG?

At higher quality settings (lower compression settings) the same image saved as PNG is still notably larger than a JPEG image. My philosophy? Image formats, like programming languages, are tools. Use the right tool for the right job. There is no magic, universal tool to do all jobs.


Video Answer


1 Answers

Is it reasonable to assume that these two implementations of JPEG compression could yield identical output files?

The answer is not really.

The point of the JPEG compression is high compression with loss. Even with the quality setting of 100, loss is inevitable, given the algorithm requires infinite precision to exactly replicate the source image.

It is possible to produce identical files, if both algorithms are coded identically using the same parameters: precision, boundary selection and padding/offset specifications to provide the power of 2 size for the FFT.

Implementations of the JPEG algorithm may use pre-passes to optimize the parameters of the algorithm.

Given that the optimizations of the parameters differs between the two implementations, it is unlikely that the outputs would be identical.


Are there additional parameters to PIL .save() to make it behave more like the JPEG encoder in C#?

I cannot answer this question directly, but, you can use the package: Python for.NET to access the C# JPEG encoder from Python. This solution would provide consistent identical results.


Why would anyone need binary compatibility, other than the educational value?

In all of my perceived practical scenarios addressing the question, the only need is to save an additional hash of the image: save the new hash in a separate field.

Pick a technology and use it until it no longer fits your needs/requirements. When it doesn't (preferably well before), find shims to fill the gap and rewrite the code to utilize the new technology.

like image 94
Strom Avatar answered Oct 27 '22 22:10

Strom