Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiency of custom compression algorithm

I have a compression algorithm idea and I have two questions:

  1. Should I deal with it ? Will it be efficient ?
  2. How can I optimize it?

Here is the algorithm I've created so far.

int i = 0,j, diff, beginIndex = 0;
while(i < tmp.length){
    j = i;
    byte first = tmp[i];
    int total = 0;
    while(j < tmp.length && first == tmp[j] && total < 127){ j++; total++;}

    if(total > 3){
        if(beginIndex != i){
            diff = i - beginIndex;
            packed.put((byte)diff);
            packed.put(tmp, beginIndex, diff);
        }
        packed.put((byte)(0x80 | total));
        packed.put(tmp[i]);
        beginIndex = j; 
    } 

    i = j;

    if(i-beginIndex == 127){
        packed.put((byte)127);
        packed.put(tmp, beginIndex, 127);
        beginIndex = i;
    }
}

if(beginIndex < i){
    diff = i - beginIndex;
    packed.put((byte)diff);
    packed.put(tmp, beginIndex, diff);
}

Example input (each letter describes a byte)

[A, B, C, D, E, E, B, B, A, A, A, A, A, A, A, A, A, A, A, A, A, B, B, B, B, C, C] = 27 bytes

Example output

[0x80, A, B, C, D, E, E, B, B, 0x8D, A, 0x84, B, 0x82, C, C] = 16 bytes

In examples 0x80 is the packed bit. Represents if following letter will be repeated. 0xFF - 0x80 = 0x7F is the maksimum repeat count (127). So, 0x8D represents following byte will be repeated 0xD (13) times

Any idea optimizing that algorithm? Will it be useful or shall I get rid off the idea?

like image 408
webdev Avatar asked May 20 '26 16:05

webdev


1 Answers

The question is, what is the purpose of your algorithm?

To invent something really new, you need to check, what was invented before. Read some papers and book about data compression, etc. Data Compression Explained can be good place to start.

If you just want to practice in writing algorithms, it's totally OK. Continue with improving your algorithm, refactoring, speedups, profiling, etc.

If you want your algorithm to be practical, once again, check what was created before. Open source compression algorithms, such as zlib worth studying.

If you want to check, how your algorithm compares to others, run it on some popular test, such as Silesia Open Source Compression Benchmark. That'll give your an intuition where you stand (this might be somewhat disappointing, but don't give up).

Finally, if you want to have fun, just do whatever you want, and don't listen to anyone.

like image 78
EvgeniyZh Avatar answered May 23 '26 05:05

EvgeniyZh