Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

There must be a really fast way to calculate this bitwise expression?

Let v and w be two bitstrings. In the current application they consist of 8 bits. I am looking for the fastest way to calculate the following expression.

x = (v[1] & w[0]) ^ (v[2] & w[1]) ^ (v[2] & w[0]) ^ (v[3] & w[2]) ^ (v[3]) & w[1]) ^ (v[3] & w[0]) ^ ...

Some ideas on the subject: one thing I noticed is that this expression can also be written as below. Let

P(w[k]) = w[k] ^ w[k-1] ^ ... ^ w[0]

denote the parity of the lowest k + 1 bits of w. Then

x = (v[1] & P(w[0])) ^ (v[2] & P(w[1])) ^ (v[3] & P(w[2])) ^ ... ^ (v[7] & P(w[6]))

Now if Pw is a bitstring in which each bit denotes the parity of the lower bits, i.e. for which Pw[i] = P(w[i-1]) then x could be written as follows:

x = P(v & Pw)

Now, on http://graphics.stanford.edu/~seander/bithacks.html I found a quick way to calculate the parity of a string, but in order to build a fast algorithm based on this, I would also need a fast way to calculate the bitstring Pw described above.

Or maybe I'm going about this in the wrong way completely, there are an awful lot of parity calculations to be done this way. If this is indeed the way to go, I was wondering if it would be possible (assuming the program will run on x86) to use the parity flag in assembly to speed up the calculation.

Finally, this would be a calculation that would be needed a LOT in the application I am developing, so speed is really of the seence. I was wondering if it would be possible to do the entire calculation within a register and if this could be faster than creating a lookup table in memory.

like image 675
user2987424 Avatar asked Nov 13 '13 13:11

user2987424


2 Answers

If v and w are truly 8 bits, then you could just precalculate all 256^2 combinations and store the result in a table of 65K bytes. That will easily fit into a cache. Your computation then becomes:

  precomputed[v<<8+w]

which is a few machine clocks and a hot cache line lookup. Might be hard to beat.

like image 52
Ira Baxter Avatar answered Oct 19 '22 05:10

Ira Baxter


On x86 the parity bit is automatically calculated for low 8-bit arithmetic operations. Basically the required operations are reduced to:

 Pw = Lookup_256[w];
 v &= Pw;                 // get the Parity as side effect on x86, or

 v  = Lookup_256[v] >> 7; // Reuse the table to get parity for bit 7

EDIT

Higher level optimizations and parallel implementation is achievable by recognizing that the partial products (v[i] & w[j]) are internal part of multiplication and that the concatenation with the operator ^ makes this overall operation carry-less (or polynomial).

The overall operation would be Parity( ((v >> 1) Px w) & 0xff), where Px denotes polynomial multiplication, which is supported in e.g. NEON and in intel architecture with PCLMULQDQ instruction. The Intel instruction (unfortunately) operates in 64-bit words, making it probably possible, but difficult to incorporate several independent vectors v,w to be multiplied simultaneously.

like image 3
Aki Suihkonen Avatar answered Oct 19 '22 05:10

Aki Suihkonen