I've been trying to figure this out by reverse engineering a .png file I created in GIMP. It's 4x4 pixels. My goal is to decode the raw pixels out of the file with the intent of reversing this to encode.
Here is a full hex dump of the file:
89504E47 0D0A1A0A 0000000D 49484452 00000004 00000004
08020000 00269309 29000000 3F494441 54081D01 3400CBFF
01CC96B1 134FE120 C0CECDF1 5101FFA5 60000000 000000E0
403201DF E59286DF 6D000000 00000004 EDB11F00 2E007A21
93EDB11F 3063136F 4733525A 00000000 49454E44 AE426082
According to the spec, we start with the PNG signature which is the first 8 bytes.
89504E47 0D0A1A0A
We then have repeating "chunk" structures, this file has 3 "chunks", the header (IHDR), the image data (IDAT) and then the end "chunk" (IEND).
Each chunk is arranged into: the first 4 bytes for the length of the chunk data, the next 4 bytes for the type of data, then n-bytes for the actual data and then 4 bytes for the cyclic redundancy check (CRC) of the type of data and actual data sections.
Following this through...
0000000D
Is the chunk's data length (13 bytes).
49484452
Is the chunk's type (IHDR).
00000004 00000004 08020000 00
Is the chunk's data (4 bytes width, height; 1 byte bit depth, colour type, compression method, filter method, interlace method).
269309 29
Is the CRC of the data and type (managed to get the code to work this out from here.
000000 3F
Is the next chunk's data length (63 bytes).
494441 54
Is the chunk's type (IDAT).
081D01 3400CBFF 01CC96B1 134FE120 C0CECDF1 5101FFA5 60000000 000000E0 403201DF E59286DF 6D000000 00000004 EDB11F00 2E007A21 93EDB11F 3063136F
Is the chunk's actual data (the image data compressed and filtered).
So my actual question is how do I decode this last section into raw pixels?
According to the spec, I must first decompress the data (INFLATE?) and then unfilter it (??) to be left with scanlines of pixels (my goal).
If this could be explained in pseudocode that would be amazing! Otherwise I'm familiar with Swift and less so with C...
The PNG format makes use of delta encoding in a format that it calls “filtering”. Basically, for each scan-line of pixels, a current pixel is encoded in some relation to the pixel to the left, the pixel above, and the pixel above-left.
PNG images use a binary format because efficiency is important in creating small image files. However, PNG does binary formats right: it specifies byte orders and word lengths to avoid the NUXI problem. There are often business reasons to use binary formats.
This book explains the process of decoding for programmers:
https://www.amazon.com/Compressed-Image-File-Formats-JPEG/dp/0201604434
The entire process is way to complicated to fit within the space of a SO answer. PNG uses two different methods of compression: LZ and Huffman encoding.
I think pngcheck
will help you considerably:
pngcheck -vv result.png
Sample Output
File: result.png (334985 bytes)
chunk IHDR at offset 0x0000c, length 13
600 x 450 image, 24-bit RGB, non-interlaced
chunk IDAT at offset 0x00025, length 65536
zlib: deflated, 32K window, default compression
row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
1 4 4 4 4 4 4 4 1 4 4 4 4 4 4 4 1 4 4 4 4 4 4 4 4
1 4 4 1 4 1 4 1 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 4 4 1 1 1 1 1 1 1 4 4 1 4 1 1 4
4 4 4 4 4 1 4 4 4 4 4 4 4 1 4 4 4 4 4 4 4 (96 out of 450)
chunk IDAT at offset 0x10031, length 65536
row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 4 4
4 4 4 4 4 4 4 4 4 4 4 4 4 1 4 4 4 (188 out of 450)
chunk IDAT at offset 0x2003d, length 65536
row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
4 4 4 4 4 4 4 4 4 4 4 4 1 4 4 4 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
4 1 4 4 4 4 4 4 4 4 (273 out of 450)
chunk IDAT at offset 0x30049, length 65536
row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
4 4 4 4 4 4 4 4 4 2 4 4 4 4 4 1 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4 4 4 1 4 4 4 4 4 4 4 1 4 4 4
4 4 4 4 1 4 4 4 (356 out of 450)
chunk IDAT at offset 0x40055, length 65536
row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
4 4 4 4 1 4 4 4 4 4 4 4 1 4 4 4 4 4 4 4 4 4 4 4 4
4 4 4 1 4 4 4 4 4 4 4 1 1 1 4 4 1 4 4 1 1 1 4 4 4
4 4 1 1 1 4 1 4 4 4 1 4 1 4 4 4 1 4 1 4 4 4 4 4 4
4 1 1 4 1 4 4 1 4 1 (441 out of 450)
chunk IDAT at offset 0x50061, length 7188
row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
4 1 1 4 4 4 4 1 4 (450 out of 450)
chunk IEND at offset 0x51c81, length 0
No errors detected in result.png (8 chunks, 58.6% compression).
Your question is way too broad for pseudocode -- or at least, actionable pseudocode - if we go high-level enough, you already know the basics: 1. uncompress, 2. unfilter, 3. profit.
Since you state in comments that you want to do everything yourself, I propose that you start by implementing the default decompression algorithm, ZLIB CM=8 "deflate", which is well-described in https://www.ietf.org/rfc/rfc1951.txt among others.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With