Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing GIF Raster Data - LZW

I've been trying to decompress GIF's in PHP and seem to have everything except the LZW decompression down. I have saved an image that is shown: sample image

This image is 3 x 5 like this:

Blue  Black Black
Black Blue  Black
Black Black Black
White White White
White White White

I decided to go through manually in Binary and parse this file. The result of manual parsing is below. I am still stuck as to how to decode the raster data here. Can someone break down how the raster data becomes the image? I've been able to break down one image, but nothing else (not this image). I have posted my understanding of how this should break down, but I am obviously doing it wrong.

01000111 G
01001001 I
01000110 F
00111000 8
00111001 9
01100001 a

Screen Descriptor
WIDTH
00000011 3
00000000

00000101 5
00000000

10010001 GCM (1), CR (001), BPP (001), CD = 2, COLORS = 4

00000000 BGCOLOR Index

00000000 Aspect Ratio

GCM
BLUE
00110101 | 53
00000000 | 0
11000001 | 193

WHITE
11111111 | 255
11111111 | 255
11111111 | 255

BLACK
00000000 | 0
00000000 | 0
00000000 | 0

00000000 | 0
00000000 | 0
00000000 | 0

Extension
00100001 | 21
Function Code
11111001 | F9
Length
00000100 | 4
00000000
00000000
00000000
00000000
Terminator
00000000

Local Descriptor
00101100 Header
XPOS
00000000 | 0
00000000

YPOS
00000000 | 0
00000000

Width
00000011 | 3
00000000

Height
00000101 | 5
00000000

Flags
00000000 (LCM = 0, Interlaced = 0, Sorted = 0, Reserved = 0, Pixel Bits = 0)

RASTER DATA
Initial Code Size
00000010 | 2
Length
00000101 | 5

Data
10000100
01101110
00100111
11000001
01011101

Terminator
00000000

00111011 | ;
00000000

My Attempt

10000100
01101110
00100111
11000001
01011101

Initial Code Size = 3 Read 2 bits at a time

10
00
Append last bit to first (010)
String becomes 010 or 2. 2 would be color # 3 or BLACK

At this point, I am already wrong. The first color should be blue.

Resources I have been using:

http://www.daubnet.com/en/file-format-gif http://en.wikipedia.org/wiki/Graphics_Interchange_Format http://www.w3.org/Graphics/GIF/spec-gif87.txt

like image 778
teynon Avatar asked Jan 07 '13 20:01

teynon


2 Answers

GIF parser

You said you want to write your own GIF parser in order to understand how it works. I suggest you look at the source code of any of the libraries containing GIF readers, such as the de-facto reference implementation GIFLIB. The relevant source file is dgif_lib.c; start at slurp for decoding, or jump to the LZW decompression implementation.

Here's how your image decodes.

I think the issue was that you were splitting the input bytes into LZW codes incorrectly.

Number of colors is (0b001 + 1) * 2 = 4.

Code size starts at 2 + 1 = 3 bits.

So the initial dictionary is

000 = color 0 = [blue]
001 = color 1 = [white]
010 = color 2 = [black]
011 = color 3 = [black]
100 = clear dictionary
101 = end of data

Now, GIF packs LZW codes into bytes in LSB-first order. Accordingly, the first code is stored as the 3 least-significant bits of the first byte; the second code as the next 3 bits; and so on. In your example (first byte: 0x84 = 10000100), the first 2 codes are thus 100 (clear) and 000 (blue). The whole thing

01011101 11000001 00100111 01101110 10000100

is split into codes (switches to 4-bit groups after reading the highest 3-bit code, 111) as

0101 1101 1100 0001 0010 0111 0110 111 010 000 100

This decodes to:

     last
code code
 100      clear dictionary
 000      output [blue] (1st pixel)
 010  000 new code in table:
              output 010 = [black]
              add 110 = old + 1st byte of new = [blue black] to table
 111  010 new code not in table:
              output last string followed by copy of first byte, [black black]
              add 111 = [black black] to table
              111 is largest possible 3-bit code, so switch to 4 bits
0110 0111 new code in table:
              output 0110 = [blue black]
              add 1000 = old + 1st byte of new = [black black blue] to table
0111 0110 new code in table:
              output 0111 = [black black]
              add 1001 = old + 1st byte of new = [blue black black] to table
...

So the output starts (wrapping to 3 columns):

blue  black black
black blue  black
black black ...

which is what you wanted.

like image 64
Mechanical snail Avatar answered Nov 13 '22 08:11

Mechanical snail


This site is an excellent resource about the GIF format, and offers a great explanation of the LZW compression and decompression process:

http://www.matthewflickinger.com/lab/whatsinagif/index.html

like image 23
Thomas Levesque Avatar answered Nov 13 '22 07:11

Thomas Levesque