Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimize an array of tribools for space

Let me start with some background:

By "tribool" I understand a variable which can hold one of the following values: true, false or null.

In question Copying array of ints vs pointers to bools , the OP wanted to have an array of tribools (more or less) which would be as small as possible.

With "a bit of" most basic bit-fu I came up a solution which used 2 bits per tribool and allowed to store the OP's array of 64 tribools in 16 bytes, which is OK.

The tribool mechanics I used were simple, like:

  • boolean A means "null or not null",
  • boolean B means "true or false if not null".

But then I thought... An algorithmical definition of a "bit" is:

A bit is the amount of information which specifies which of two equally probable events shall occur.

Clearly a true/false value is 1 bit big. Two true-false values as a whole are 2 bit big.

And what about our conceptual tribool?

My point is: In terms of the size of contained information, a tribool is bigger than 1 bit but smaller than 2 bits.

  • Justification 1: Assume we implement our if boolean as described above. If boolean A is "null", the value of boolean B is redundant and doesn't carry any relevant information.
  • Justification 2: It's impossible to store information from 2 independent boolean values in one tribool, so it has

(None of the above is a formal proof, but I believe that we can agree that about the "size" of the tribool being strictly bigger than 1 bit and strictly smaller than 2.)


My question is:

How to programatically take advantage of the fact that a tribool has less information than 2 bits, and implement in software (c, c++?) an array of N tribools which would have the memory footprint smaller than N/4 bytes for some N?

Yes, I do understand that such an implementation isn't really hardware-friendly and would perform slower than any common solution with redundance (as those presented in the OP's question). Let's just optimize for space, not for efficiency.

Clearly this implementation needs a different representation of a tribool than a pair of bools (which is by itself redundant, as described before). The theory says it's possible to achieve that goal and I like to see an actual implementation. Any ideas?

like image 674
Kos Avatar asked Dec 11 '10 12:12

Kos


2 Answers

Your intuition is correct, this is certainly possible. This is basically a form of arithmetic coding, or at least a simple instance of it.

The easiest way to think of it is to imagine encoding your array of "tribools" as a number in base 3 - e.g. 0=FALSE, 1=TRUE, 2=NULL. Then the following array:

{TRUE, FALSE, NULL, NULL, FALSE, FALSE, TRUE}

encodes to the number

1022001

which you can then convert to decimal in the normal way:

(1*3^0)+(0*3^1)+(0*3^2)+(2*3^3)+(2*3^4)+(0*3^5)+(1*3^6) = 946

Each tribool takes up ln(3)/ln(2) bits (about 1.58), so using this method you can store 20 tribools in 32 bits - so you can store an N=20 array in 4 bytes (where N/4 is 5).

like image 92
psmears Avatar answered Oct 30 '22 11:10

psmears


You can theoretically pack X N-state variables in

ln(N^X) / ln M

M-state (or log_M (N^X) in LaTeX-like notation) variables. For storing tri-state variables in binary digits the formula above becomes:

ln(3^N) / ln 2

In an 8-bit byte, for example you could fit 5 tri-state variables.

Unpacking/Modifying those values would be a lot harder and slower as you pack variables more densely. In the example above you would have to recalculate the whole byte in order to change a single tri-state variable.

It should be noted that a byte for 5 tri-state variables is quite space-efficient. The density remains the same per-byte, until you have a pack of 22 bytes, which can fit 111 tri-state values, instead of 110. Handling that kind of packing would a mess, though.

Is any of this worth the extra work in comparison to directly storing 4 tri-state values in a byte?

like image 23
thkala Avatar answered Oct 30 '22 10:10

thkala