Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encoding / Error Correction Challenge

Is it mathematically feasible to encode and initial 4 byte message into 8 bytes and if one of the 8 bytes is completely dropped and another is wrong to reconstruct the initial 4 byte message? There would be no way to retransmit nor would the location of the dropped byte be known.

If one uses Reed Solomon error correction with 4 "parity" bytes tacked on to the end of the 4 "data" bytes, such as DDDDPPPP, and you end up with DDDEPPP (where E is an error) and a parity byte has been dropped, I don't believe there's a way to reconstruct the initial message (although correct me if I am wrong)...

What about multiplying (or performing another mathematical operation) the initial 4 byte message by a constant, then utilizing properties of an inverse mathematical operation to determine what byte was dropped. Or, impose some constraints on the structure of the message so every other byte needs to be odd and the others need to be even.

Alternatively, instead of bytes, it could also be 4 decimal digits encoded in some fashion into 8 decimal digits where errors could be detected & corrected under the same circumstances mentioned above - no retransmission and the location of the dropped byte is not known.

I'm looking for any crazy ideas anyone might have... Any ideas out there?

EDIT:

It may be a bit contrived, but the situation that I'm trying to solve is one where you have, let's say, a faulty printer that prints out important numbers onto a form, which are then mailed off to a processing firm which uses OCR to read the forms. The OCR isn't going to be perfect, but it should get close with only digits to read. The faulty printer could be a bigger problem, where it may drop a whole number, but there's no way of knowing which one it'll drop, but they will always come out in the correct order, there won't be any digits swapped.

The form could be altered so that it always prints a space between the initial four numbers and the error correction numbers, ie 1234 5678, so that one would know whether a 1234 initial digit was dropped or a 5678 error correction digit was dropped, if that makes the problem easier to solve. I'm thinking somewhat similar to how they verify credit card numbers via algorithm, but in four digit chunks.

Hopefully, that provides some clarification as to what I'm looking for...

like image 229
user21293 Avatar asked Mar 06 '10 17:03

user21293


2 Answers

In the absence of "nice" algebraic structure, I suspect that it's going to be hard to find a concise scheme that gets you all the way to 10**4 codewords, since information-theoretically, there isn't a lot of slack. (The one below can use GF(5) for 5**5 = 3125.) Fortunately, the problem is small enough that you could try Shannon's greedy code-construction method (find a codeword that doesn't conflict with one already chosen, add it to the set).


Encode up to 35 bits as a quartic polynomial f over GF(128). Evaluate the polynomial at eight predetermined points x0,...,x7 and encode as 0f(x0) 1f(x1) 0f(x2) 1f(x3) 0f(x4) 1f(x5) 0f(x6) 1f(x7), where the alternating zeros and ones are stored in the MSB.

When decoding, first look at the MSBs. If the MSB doesn't match the index mod 2, then that byte is corrupt and/or it's been shifted left by a deletion. Assume it's good and shift it back to the right (possibly accumulating multiple different possible values at a point). Now we have at least seven evaluations of a quartic polynomial f at known points, of which at most one is corrupt. We can now try all possibilities for the corruption.

EDIT: bmm6o has advanced the claim that the second part of my solution is incorrect. I disagree.

Let's review the possibilities for the case where the MSBs are 0101101. Suppose X is the array of bytes sent and Y is the array of bytes received. On one hand, Y[0], Y[1], Y[2], Y[3] have correct MSBs and are presumed to be X[0], X[1], X[2], X[3]. On the other hand, Y[4], Y[5], Y[6] have incorrect MSBs and are presumed to be X[5], X[6], X[7].

If X[4] is dropped, then we have seven correct evaluations of f.

If X[3] is dropped and X[4] is corrupted, then we have an incorrect evaluation at 3, and six correct evaluations.

If X[5] is dropped and X[4] is corrupted, then we have an incorrect evaluation at 5, and six correct evaluations.

There are more possibilities besides these, but we never have fewer than six correct evaluations, which suffices to recover f.

like image 194
user287792 Avatar answered Sep 18 '22 19:09

user287792


I think you would need to study what erasure codes might offer you. I don't know any bounds myself, but maybe some kind of MDS code might achieve this.

EDIT: After a quick search I found RSCode library and in the example it says that

In general, with E errors, and K erasures, you will need
* 2E + K bytes of parity to be able to correct the codeword
* back to recover the original message data.

So looks like Reed-Solomon code is indeed the answer and you may actually get recovery from one erasure and one error in 8,4 code.

like image 31
Krystian Avatar answered Sep 19 '22 19:09

Krystian