Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary run length encoding

I have a web form, for the contents of which I would like to generate a short representation in Base64. The form, among other things, contains a list of 264 binary values, the greater part of which are going to be 0 at any single time. (They represent regions on a geographical map). Even in Base64, this 264-bit number generates a long, intimidating string. I want to implement run-length encoding, as efficiently as possible. Can you help me with this? I've googled binary RLE, but have found nothing of use.

What I've tried this far - running RLE on the binary string using decimal counts and "A" as a separator denoting a change between 0 and 1, then converting the result from base 11 to base 64. For example:

00000000001111111000000010000000000000000000000001111111110001111010101000000000000000000000000000000000000111111111110111000000000000111111100000001000000000000000000000000111111111000111101010100000000000000000000000000000000000011111111111011100

becomes

10A5A5AA22A7A1A2AAAAAAA34A9AA1A10A5A5AA22A7A1A2AAAAAAA34A9AA1A

which in turn becomes

CNnbr/FxkgbbOw0LNAKgk65P8SdvaTG+t74o

or, in base 62,

6imo7zq1pqr2mqglTHzXwJRAksm7fvHZHWQK

It's better, but I still can't help but doubt if I'm doing something wrong - is using the digit "A" as a separator is the best way to do this?

And another update:

Thanks to @comingstorm, I have shortened the compressed string some more.

ILHHASCAASBYwwccDASYgAEgWDI=

As I mentioned it in the comments, real usage cases would generally result in an even shorter string.

like image 550
avramov Avatar asked Sep 29 '11 14:09

avramov


People also ask

What is run length encoding example?

In run length encoding, we replace each row with numbers that say how many consecutive pixels are the same colour, always starting with the number of white pixels. For example, the first row in the image above contains one white, two black, four white, one black, four white, two black, and one white pixel.

What is run length encoding good for?

RLE is particularly well suited to palette-based bitmap images such as computer icons, and was a popular image compression method on early online services such as CompuServe before the advent of more sophisticated formats such as GIF.

What is run length encoding best at?

This type of compression works best with simple images and animations that have a lot of redundant pixels. It's useful for black and white images in particular. For complex images and animations, if there aren't many redundant sections, RLE can make the file size bigger rather than smaller.

What is a run in run length encoding?

Run length encoding (RLE) One of the simplest examples of compression is RLE. RLE is a basic form of data compression that converts consecutive identical values into a code consisting of the character and the number marking the length of the run. The more similar values there are, the more values can be compressed.


2 Answers

Since you're coding bits, you probably want to use a bit-based RLE instead of a byte-based one. In this context, you should consider Elias gamma coding (or some variant thereof) to efficiently encode your run lengths.

A reasonable first approximation for your encoding format might be:

  • first bit = same as the first bit of the uncompressed string (to set initial polarity)
  • remaining bits: Elias coded lengths of successive bit runs (alternating 1 and 0)

Since you know how many bits are in your uncompressed string, you don't need a termination code; you can just add any necessary binary padding as arbitrary bits.

Note that it is always possible for the run-length "compression" to expand your bit string; if you're concerned about this, you can add another initial bit to indicate whether your data is in compressed or uncompressed format, limiting your compression overhead to 1 bit.

like image 55
comingstorm Avatar answered Oct 01 '22 07:10

comingstorm


264 bit, that are just 33 byte, and that are in base64 just 44 byte. I think this (very small) amount of information is hardly compressable. The sparse representation nulvinge refers too just stores the non zero elements and their values (as you have just 0/1), i.e. in your case just the index of the non zero bits. But as you have 264 possible bits - you need 9 bits for the index, which means, in case you have more than 29 non zero entries, you need already more than original.

Maybe your question is formulated wrong, but I dont see how 264 bits can lead to an intimidating base64 string (How do you generate it - maybe you translate not the 264 bits, but 264 ASCIIs chars (with the value 0 and 1) - that would explain your long result string?).

like image 29
flolo Avatar answered Oct 01 '22 07:10

flolo