I have 3 buffers containing R, G, B bit data running on a 32-bit processor.
I need to combine the three bytes in the following way:
R[0] = 0b r1r2r3r4r5r6r7r8
G[0] = 0b g1g2g3g4g5g6g7g8
B[0] = 0b b1b2b3b4b5b6b7b8
int32_t Out = 0b r1g1b1r2g2b2r3g3 b3r4g4b4r5g5b5r6 g6b6r7g7b7r8g8b8 xxxxxxxx
where xxxxxxxx is continuing on to each of the next bytes in the buffers.
I am looking for an optimal way to combine them. My approach is definitely not efficient.
Here is my approach
static void rgbcombineline(uint8_t line)
{
uint32_t i, bit;
uint8_t bitMask, rByte, gByte, bByte;
uint32_t ByteExp, rgbByte;
uint8_t *strPtr = (uint8_t*)&ByteExp;
for (i = 0; i < (LCDpixelsCol / 8); i++)
{
rByte = rDispbuff[line][i];
gByte = gDispbuff[line][i];
bByte = bDispbuff[line][i];
bitMask = 0b00000001;
ByteExp = 0;
for(bit = 0; bit < 8; bit++)
{
rgbByte = 0;
rgbByte |= ((rByte & bitMask) >> bit) << 2;
rgbByte |= ((gByte & bitMask) >> bit) << 1;
rgbByte |= ((bByte & bitMask) >> bit);
ByteExp |= (rgbByte << 3*bit);
bitMask <<= 1;
}
TempLinebuff[((i*3)+0) +2] = *(strPtr + 2);
TempLinebuff[((i*3)+1) +2] = *(strPtr + 1);
TempLinebuff[((i*3)+2) +2] = *(strPtr + 0);
}
}
If you can spare 1024 bytes, you can achieve your desired result with a single 256-element lookup table:
uint32_t lookup[256] = {
0, 1, 8, 9, 64, 65, ...
/* map abcdefgh to a00b00c00d00e00f00g00h */
};
uint32_t result = (lookup[rByte] << 2) | (lookup[gByte] << 1) | lookup[bByte];
This uses only 3 lookups, 2 shifts and 2 or
operations, which should provide an acceptable speedup.
If you have more space, you can use three lookup tables to eliminate the shifts too (although this may result in worse cache performance, so always profile to check!)
You can use a multiplication by a "magical" constant to replicate the bits. Then use bit-shifts to extract the needed bits, and bit-wise masking to combine them. The "magical" constant is a 17-bit binary 10000000100000001. When multiplied by it, any 8-bit number is concatenated to itself 3 times.
r1r2r3r4r5r6r7r8 * M = r1r2r3r4r5r6r7r8r1r2r3r4r5r6r7r8r1r2r3r4r5r6r7r8 r1r2r3r4r5r6r7r8 * M shr 2 = 0 0 r1r2r3r4r5r6r7r8r1r2r3r4r5r6r7r8r1r2r3r4r5r6 r1r2r3r4r5r6r7r8 * M shr 4 = 0 0 0 0 r1r2r3r4r5r6r7r8r1r2r3r4r5r6r7r8r1r2r3r4 r1r2r3r4r5r6r7r8 * M shr 6 = 0 0 0 0 0 0 r1r2r3r4r5r6r7r8r1r2r3r4r5r6r7r8r1r2
The bits marked in bold are those that are at the right places.
If you use this masking code
R * M & 0b100000000000100000000000 |
(R * M >> 2) & 0b000100000000000100000000 |
(R * M >> 4) & 0b000000100000000000100000 |
(R * M >> 6) & 0b000000000100000000000100
you will get the "red" bits combined in the right way:
r1 0 0 r2 0 0 r3 0 0 r4 0 0 r5 0 0 r6 0 0 r7 0 0 r8 0 0
Then combine the "blue" and "green" bits in a similar way.
A rough estimation of the number of operations:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With