I don't get Golomb / Rice coding: It does make more bits of the input, or does it?

Tags:

compression

Or, maybe, what I don't get is unary coding:

In Golomb, or Rice, coding, you split a number N into two parts by dividing it by another number M and then code the integer result of that division in unary and the remainder in binary.

In the Wikipedia example, they use 42 as N and 10 as M, so we end up with a quotient q of 4 (in unary: 1110) and a remainder r of 2 (in binary 010), so that the resulting message is 1110,010, or 8 bits (the comma can be skipped). The simple binary representation of 42 is 101010, or 6 bits.

To me, this seems due to the unary representation of q which always has to be more bits than binary.

Clearly, I'm missing some important point here. What is it?

382

asked Apr 08 '09 08:04

Hanno Fietz

2 Answers

The important point is that Golomb codes are not meant to be shorter than the shortest binary encoding for one particular number. Rather, by providing a specific kind of variable-length encoding, they reduce the average length per encoded value compared to fixed-width encoding, if the encoded values are from a large range, but the most common values are generally small (and hence are using only a small fraction of that range most of the time).

As an example, if you were to transmit integers in the range from 0 to 1000, but a large majority of the actual values were in the range between 0 and 10, in a fixed-width encoding, most of the transmitted codes would have leading 0s that contain no information:

To cover all values between 0 and 1000, you need a 10-bit wide encoding in fixed-width binary. Now, as most of your values would be below 10, at least the first 6 bits of most numbers would be 0 and would carry little information.

To rectify this with Golomb codes, you split the numbers by dividing them by 10 and encoding the quotient and the remainder separately. For most values, all that would have to be transmitted is the remainder which can be encoded using 4 bits at most (if you use truncated binary for the remainder it can be less). The quotient is then transmitted in unary, which encodes as a single 0 bit for all values below 10, as 10 for 10..19, 110 for 20..29 etc.

Now, for most of your values, you have reduced the message size to 5 bits max, but you are still able to transmit all values unambigously without separators.

This comes at a rather high cost for the larger values (for example, values in the range 990..999 need 100 bits for the quotient), which is why the coding is optimal for 2-sided geometric distributions.

The long runs of 1 bits in the quotients of larger values can be addressed with subsequent run-length encoding. However, if the quotients consume too much space in the resulting message, this could indicate that other codes might be more appropriate than Golomb/Rice.

answered Oct 21 '22 05:10

Hanno Fietz

One difference between the Golomb coding and binary code is that binary code is not a prefix code, which is a no-go for coding strings of arbitrarily large numbers (you cannot decide if 1010101010101010 is a concatenation of 10101010 and 10101010 or something else). Hence, they are not that easily comparable.

Second, the Golomb code is optimal for geometric distribution, in this case with parameter 2^(-1/10). The probability of 42 is some 0.3 %, so you get the idea about how important is this for the length of the output string.

answered Oct 21 '22 05:10

jpalecek

Related questions
                            
                                What is the fastest way to extract 1 file from a zip file which contain a lot of file?
                            
                                How do I compress a folder with the Python GZip module?
                            
                                How to extract ZIP file in C#
                            
                                Can i use more heap than 32 GB with compressed oops
                            
                                Can PHP decompress a taz file? (.tar.Z)
                            
                                QM coding implementation in Python - is 16 bit word obligatory?
                            
                                Can a page opt out of IIS 7 compression?
                            
                                7z extension for php?
                            
                                Hierarchical JPEG Encoder/Decoder
                            
                                HTTP: What is the preferred Accept-Encoding for "gzip,deflate"?
                            
                                Random access gzip stream
                            
                                Tomcat with compression enabled causes error on OS X High Sierra
                            
                                Are there any agressive CSS Minification tools?
                            
                                How can I easily compress and decompress files using zlib? [closed]
                            
                                How to compress http request on the fly and without loading compressed buffer in memory
                            
                                Needed - Visual Studio Custom Build Actions To Minify JS and CSS
                            
                                zipping files with the same name in different folders using 7z @listfile feature
                            
                                Zip support in Apache Spark
                            
                                With GNU GZIP environment variable deprecated, how to control ZLIB compression via tar?
                            
                                how to dump sql.gz file in to mysql

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

I don't get Golomb / Rice coding: It does make more bits of the input, or does it?

Tags:

compression

Hanno Fietz

People also ask

2 Answers

Hanno Fietz

jpalecek

Recent Activity

Donate For Us