Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

redundant encoding?

This is more of a computer science / information theory question than a straightforward programming one, so if anyone knows of a better site to post this, please let me know.

Let's say I have an N-bit piece of data that will be sent redundantly in M messages, where at least M-1 of those messages will be received successfully. I am interested in different ways of encoding the N-bit piece of data in fewer bits per message. (this is similar to RAID but at a much smaller level, where N = 8 or 16 or 32)

Example: suppose N = 16 and M = 4. Then I could use the following algorithm:

1st and 3rd message: send "0" + bits 0-7
2nd and 4th message: send "1" + bits 8-15

If I can guarantee that 3 messages of the 4 will get through, then at least one message from each group will get through. Thus I can make this work with 9 bits or less, there's probably a way to do this with fewer total bits but I'm not sure how.

Are there some simple encoding/decoding algorithms to do this kind of thing? Does this problem have a name? (if I know what it's called, I can google it!)

note: in my particular case, the messages either arrive correctly or do not arrive at all (no messages arrive with errors).

(edit: moved 2nd part to a separate question)

like image 624
Jason S Avatar asked Dec 29 '22 03:12

Jason S


2 Answers

(Incomplete answer follows. I may add more later.)

The term you may be interested in is channel coding: adding redundancy to a source in order to make it robust during transmission over a noisy channel. In information theory, the complementary problem to channel coding is source coding: reducing the redundancy in a source to represent it using fewer bits. (The combination of these two problems is called joint source-channel coding.)

Your first question asks to find a channel code. The simple example you give is similar to a repetition code, i.e., you send the same message more than twice (usually an odd number of times), and then the message which is received most often is accepted as the original message.

This code is inefficient. To use standard notation, let k = number of bits in original message, and n = number of bits in the transmitted message. For your example, k = 16 and n = 36. A measure of coding efficiency is k/n, where higher means more efficient. In your case, k/n = 0.44. This is low.

The repetition code is a simple kind of block code, i.e., redundancy is added to each block of k bits to create a codeword of n bits. So are the Hamming and Reed-Solomon codes as others mentioned. Hamming codes are relatively easy to understand with some basic linear algebra.

These should be enough terms for you to search on your own. Good luck.

like image 111
Steve Tjoa Avatar answered Jan 05 '23 18:01

Steve Tjoa


I'm not sure if I understood all the details of your question correctly, but your problem is definitely aboud designing some kind of error correcting code. This is a vast area of computer science and thick tomes have been written about it. Start with wikipedia and see if you can get any simple schemes (like Hamming or Reed-Solomon codes) to work in your case.

If you want to deal not only with symbol corruption, but also deletion of symbols, you should look at erasure codes, this is definitely a more difficult task but good methods exist in many cases.

EDIT: This material from hackersdelight.org seems a nice introduction.

like image 26
Krystian Avatar answered Jan 05 '23 19:01

Krystian