Lempel-Ziv-Welch decompression non-existent index

Tags:

I have an implementation off the LZW compression/decompression algorithm and for the most part have it squared away. However, I have encountered a problem with one of the files that I am testing. Below is the text for said file

#include "bits.h"

int check_endianness(){
    int i = 1;

The area that my implementation is getting stuck on is the group of spaces right before int i = 1; Below I have included my compression and decompression loop respectively along with their relative debug outputs.

compression loop

i=0;
while(i < input_len && ret == LZW_ERR_OK){
    //get next byte
    char curChar = input_char(f_input, &io_flags);
    i++;

    //not necessary to check for stream end here since the while condition does that
    if(io_flags == STREAM_ERR_READ){
        ret = LZW_ERR_READ;
        break;
    }

    seqset(&temp, &curChar, 1);

    //add bytes to temp until a sequence is found that is not in lookup
    while(i < input_len && dictionary_has_entry(lookup, temp)){
        char curChar = input_char(f_input, &io_flags);
        i++;

        //check for errors / end of file
        if(io_flags != STREAM_ERR_OK){
            if(io_flags == STREAM_ERR_READ)
                ret = LZW_ERR_READ;
            break;
        }

        seqadd(&temp, &curChar, 1);
    }

    if(temp.length > 1){
        #ifdef DEBUG
        printf("Adding entry %d: ", lookup->next_value);
        for(int j = 0; j < temp.length; j++)
            printf("%.4x ", temp.data[j]);
        printf("\n");
        #endif // DEBUG
        dictionary_set_entry(lookup, temp, DICT_SET_FOREVER);
        temp.length--; //if temp.length == 1, then the EOF was probably reached
        i--; //This is important so that the entire file is read
    }

    fseek(f_input, -1, SEEK_CUR);
    output_short(f_output, dictionary_get_entry_byKey(lookup, temp)->value, STREAM_USE_ENDIAN);
    #ifdef DEBUG
    printf("Writing code: %d\n", dictionary_get_entry_byKey(lookup, temp)->value);
    #endif // DEBUG
}

compression output

Adding entry 297: 007b 000d
Writing code: 123
Adding entry 298: 000d 000a 0020
Writing code: 273
Adding entry 299: 0020 0020
Writing code: 32
Adding entry 300: 0020 0020 0020
Writing code: 299
Adding entry 301: 0020 0069
Writing code: 32
Adding entry 302: 0069 006e 0074 0020

decompression loop

i=0;
while(i < input_len && ret == LZW_ERR_OK){
    short code;
    entry *e;
    code = input_short(f_input, &io_flags, STREAM_USE_ENDIAN);
    if(io_flags == STREAM_ERR_READ){
        ret = LZW_ERR_READ;
        break;
    }

    i += 2;

    //e should never be NULL
    printf("Reading code: %d\n", code);
    e = dictionary_get_entry_byValue(lookup, code);
    assert(e != NULL);

    seqset(&temp, e->key.data, e->key.length);

    //requires a slightly different approach to the lookup loop in lzw_encode
    while(i < input_len && e != NULL){
        code = input_short(f_input, &io_flags, STREAM_USE_ENDIAN);
        //check for errors / end of file
        if(io_flags != STREAM_ERR_OK){
            if(io_flags == STREAM_ERR_READ)
                ret = LZW_ERR_READ;
            break;
        }
        i += 2;
        printf("Reading code: %d\n", code);

        //e should never be NULL
        e = dictionary_get_entry_byValue(lookup, code);
        assert(e != NULL); <------------ This is where it is failing

        //start adding bytes to temp
        for(size_t j = 0; j < e->key.length; j++){
            seqadd(&temp, &e->key.data[j], 1);
            if(dictionary_get_entry_byKey(lookup, temp) == NULL){

                //sequence not found, add it to dictionary
                printf("Adding entry %d: ", lookup->next_value);
                dictionary_set_entry(lookup, temp, DICT_SET_FOREVER);
                for(int k = 0; k < temp.length; k++)
                    printf("%.4x ", temp.data[k]);
                printf("\n");
                e = NULL; //to escape from while
                break;
            }
        }
    }

    //if more than one code was read go back by two bytes
    if(e == NULL){
        i -= 2;
        fseek(f_input, -2, SEEK_CUR);
    }

    //only write up to the characters that made a known sequence
    temp.length--;
    for(size_t j = 0; j < temp.length; j++){
        output_char(f_output, temp.data[j]);
        #ifdef DEBUG
        //printf("%c", temp.data[j]);
        #endif // DEBUG
    }
}

decompression output

Reading code: 123
Reading code: 273
Adding entry 297: 007b 000d
Reading code: 273
Reading code: 32
Adding entry 298: 000d 000a 0020
Reading code: 32
Reading code: 299
299, 299 <----error output from dictionary (code asked > next value)
Assertion failed: e != NULL, file lzw.c, line 212

Any help would be much appreciated.

740

asked Feb 09 '17 07:02

Marcos

1 Answers

You hit the infamous KwKwK problem in the Lempel Ziv Welch decompression algorithm.

From the original article, A Technique for High-Performance Data Compression, Terry A. Welch, IEEE Computer, June 1984, pp. 8-19:

The abnormal case occurs whenever an input character string contains the sequence KwKwK, where Kw already appears in the compressor string table. The compressor will parse out Kw, send CODE(Kw), and add KwK to its string table. Next it will parse out KwK and send the just-generated CODE(KwK). The decompressor, on receiving CODE(KwK), will not yet have added that code to its string table because it does not yet know an extension character for the prior string.

The article explains how to work around this issue.

136

answered Sep 30 '22 02:09

chqrlie

Related questions
                            
                                Real numbers - how to determine whether float or double is required?
                            
                                How to use inet_pton() with the mingw compiler?
                            
                                Is is possible to write to different parts of the same file from multiple threads?
                            
                                Inserting an element in Binary Tree
                            
                                warning: return makes pointer from integer without a cast but returns integer as desired
                            
                                Printing to a file in C
                            
                                Main function with three arguments [duplicate]
                            
                                Does `const T *restrict` guarantee the object pointed-to isn’t modified?
                            
                                When using a coredump in gdb how do I know exactly which thread caused SIGSEGV? [duplicate]
                            
                                C/C++ arrays with threads - do I need to use mutexes or locks?
                            
                                Warn if another typedef'd name of a type is used in an argument list
                            
                                What does operator ':=' mean?
                            
                                Value of uninitialized elements in array of C language
                            
                                m32 and m64 compiler options provides different output
                            
                                Why is there a size mismatch between structures and unions?
                            
                                Storage Size of addrinfo isn't known
                            
                                Difference between .h files and .inc files in c
                            
                                Using round() function in c
                            
                                Deterministic way of saying "promote everything to floating before calculation" in C++
                            
                                Valid preprocessor tokens in macro concatenation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Lempel-Ziv-Welch decompression non-existent index

Tags:

c

algorithm

compression

Marcos

People also ask

1 Answers

chqrlie

Recent Activity

Donate For Us