I'm searching for a faster way for my required special extract and combine operation as described below:
+-------+-------+-------+-------+-------+-------+-------+-------+
| BIT 7 | BIT 6 | BIT 5 | BIT 4 | BIT 3 | BIT 2 | BIT 1 | BIT 0 |
+-------+-------+-------+-------+-------+-------+-------+-------+
| D1 | D0 | C1 | C0 | B1 | B0 | A1 | A0 |
+-------+-------+-------+-------+-------+-------+-------+-------+
A = A0 OR A1
B = B0 OR B1
C = C0 OR C1
D = D0 OR D1
+-------+-------+-------+-------+-------+-------+-------+-------+
| BIT 7 | BIT 6 | BIT 5 | BIT 4 | BIT 3 | BIT 2 | BIT 1 | BIT 0 |
+-------+-------+-------+-------+-------+-------+-------+-------+
| | | | | D | C | B | A |
+-------+-------+-------+-------+-------+-------+-------+-------+
For sake of simplicity above is only an 8-bit example, the same applies for 16 bit values. It should be implemented as fast as possible on dsPIC33F microcontroller.
The easy way in C is:
PairFlags |= (ChannelFlags & 0x0003) ? 0x0001 : 0;
PairFlags |= (ChannelFlags & 0x000C) ? 0x0002 : 0;
PairFlags |= (ChannelFlags & 0x0030) ? 0x0004 : 0;
PairFlags |= (ChannelFlags & 0x00C0) ? 0x0008 : 0;
PairFlags |= (ChannelFlags & 0x0300) ? 0x0010 : 0;
PairFlags |= (ChannelFlags & 0x0C00) ? 0x0020 : 0;
PairFlags |= (ChannelFlags & 0x3000) ? 0x0040 : 0;
PairFlags |= (ChannelFlags & 0xC000) ? 0x0080 : 0;
This will produce approx. 40 instructions (with O3) which corresponds to 1µs in my case.
The amount of instruction cycles should be reduced if possible. Is there a faster way either in C or inline assembly?
The following should work for reducing a 16-bit value to 8 bits (with each bit of output formed by ORing a pair of bits of input):
// Set even bits to bits in pair ORed together, and odd bits to 0...
PairFlags = (ChannelFlags | (ChannelFlags >> 1)) & 0x5555; // '0h0g0f0e0d0c0b0a'
// Compress the '00' or '01' bit pairs down to single '0' or '1' bits...
PairFlags = (PairFlags ^ (PairFlags >> 1)) & 0x3333; // '00hg00fe00dc00ba'
PairFlags = (PairFlags ^ (PairFlags >> 2)) & 0x0F0F; // '0000hgfe0000dcba'
PairFlags = (PairFlags ^ (PairFlags >> 4)) & 0x00FF; // '00000000hgfedcba'
Note: The ^
can be replaced by |
in the above for the same result.
Assuming I got everything right (not tested), this seems to generate good, branch-free code at least on gcc and clang for x86 (-O3):
uint8_t convert (uint8_t ChannelFlags)
{
return ( ((ChannelFlags & A1A0)!=0) << A_POS ) |
( ((ChannelFlags & B1B0)!=0) << B_POS ) |
( ((ChannelFlags & C1C0)!=0) << C_POS ) |
( ((ChannelFlags & D1D0)!=0) << D_POS ) ;
}
This masks out each individual bitset, then check against zero to end up with 1
or 0
in a temporary int
. This value is shifted in position in the result, before everything is finally bitwise OR:ed together. Full code:
#include <stdint.h>
#define A1A0 (3u << 0)
#define B1B0 (3u << 2)
#define C1C0 (3u << 4)
#define D1D0 (3u << 6)
#define A_POS 0
#define B_POS 1
#define C_POS 2
#define D_POS 3
uint8_t convert (uint8_t ChannelFlags)
{
return ( ((ChannelFlags & A1A0)!=0) << A_POS ) |
( ((ChannelFlags & B1B0)!=0) << B_POS ) |
( ((ChannelFlags & C1C0)!=0) << C_POS ) |
( ((ChannelFlags & D1D0)!=0) << D_POS ) ;
}
clang disassembly x86 gives 18 instructions branch free:
convert: # @convert
test dil, 3
setne al
test dil, 12
setne cl
add cl, cl
or cl, al
test dil, 48
setne al
shl al, 2
or al, cl
mov ecx, edi
shr cl, 7
shr dil, 6
and dil, 1
or dil, cl
shl dil, 3
or al, dil
ret
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With