Let me start with some background:
By "tribool" I understand a variable which can hold one of the following values: true
, false
or null
.
In question Copying array of ints vs pointers to bools , the OP wanted to have an array of tribools (more or less) which would be as small as possible.
With "a bit of" most basic bit-fu I came up a solution which used 2 bits per tribool and allowed to store the OP's array of 64 tribools in 16 bytes, which is OK.
The tribool mechanics I used were simple, like:
But then I thought... An algorithmical definition of a "bit" is:
A bit is the amount of information which specifies which of two equally probable events shall occur.
Clearly a true/false value is 1 bit big. Two true-false values as a whole are 2 bit big.
And what about our conceptual tribool?
My point is: In terms of the size of contained information, a tribool is bigger than 1 bit but smaller than 2 bits.
(None of the above is a formal proof, but I believe that we can agree that about the "size" of the tribool being strictly bigger than 1 bit and strictly smaller than 2.)
My question is:
How to programatically take advantage of the fact that a tribool has less information than 2 bits, and implement in software (c, c++?) an array of N tribools which would have the memory footprint smaller than N/4
bytes for some N?
Yes, I do understand that such an implementation isn't really hardware-friendly and would perform slower than any common solution with redundance (as those presented in the OP's question). Let's just optimize for space, not for efficiency.
Clearly this implementation needs a different representation of a tribool than a pair of bools (which is by itself redundant, as described before). The theory says it's possible to achieve that goal and I like to see an actual implementation. Any ideas?
Your intuition is correct, this is certainly possible. This is basically a form of arithmetic coding, or at least a simple instance of it.
The easiest way to think of it is to imagine encoding your array of "tribools" as a number in base 3 - e.g. 0=FALSE, 1=TRUE, 2=NULL. Then the following array:
{TRUE, FALSE, NULL, NULL, FALSE, FALSE, TRUE}
encodes to the number
1022001
which you can then convert to decimal in the normal way:
(1*3^0)+(0*3^1)+(0*3^2)+(2*3^3)+(2*3^4)+(0*3^5)+(1*3^6) = 946
Each tribool takes up ln(3)/ln(2) bits (about 1.58), so using this method you can store 20 tribools in 32 bits - so you can store an N=20
array in 4 bytes (where N/4
is 5).
You can theoretically pack X N-state variables in
ln(N^X) / ln M
M-state (or log_M (N^X) in LaTeX-like notation) variables. For storing tri-state variables in binary digits the formula above becomes:
ln(3^N) / ln 2
In an 8-bit byte, for example you could fit 5 tri-state variables.
Unpacking/Modifying those values would be a lot harder and slower as you pack variables more densely. In the example above you would have to recalculate the whole byte in order to change a single tri-state variable.
It should be noted that a byte for 5 tri-state variables is quite space-efficient. The density remains the same per-byte, until you have a pack of 22 bytes, which can fit 111 tri-state values, instead of 110. Handling that kind of packing would a mess, though.
Is any of this worth the extra work in comparison to directly storing 4 tri-state values in a byte?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With