I have to combine two 16bit-words into a 32bit-word several hundreds times, which takes a lot of computation power. I would like to find out a more efficient way to do this.
I have 2 16bit-words named A and B. I want to have a 32bit-word named C. The bits in A should be copied to the even number bits in C. The bits in B should be copied to the odd number bits in C. For example: A: 0b0000000000000000 B:0b1111111111111111 The processed C should be 0b10101010101010101010101010101010.
My current solution looks like this:
for (i = 0; i < 32; i+=2)
{
C |= (A & (1 << (i/2))) << (i/2);
C |= (B & (1 << (i/2))) << (i/2 + 1);
}
This solution takes too much time when I have several hundreds of C to deal with. I am looking for a better one!
Added: This program runs on TriCore. I have no choice but to deal with the data in this way because this relation between AB and C is defined by the protocol.
Thank you!
Turns out Tricore has a BMERGE
instruction that does precisely what you want -- it takes two 16-bit values and interleaves the bits. If you're using the gcc-based toolchain, you should be able to use a single inline asm statement -- something like:
asm("bmerge %0,%1,%2" : "=r"(C) : "r"(A), "r"(B))
There's also a BSPLIT
instruction that does the reverse.
Rather than a loop, shift in groups.
Some further simplifications possible, but below is the gist of it. Is it faster on average (or worst-case)? Profile to find out.
#include <inttypes.h>
#include <stdint.h>
uint64_t Merge(uint32_t a, uint32_t b) {
uint64_t A,B;
A = ((a & 0x00000000FFFF0000ull) << 16) | (a & 0x000000000000FFFFull);
A = ((A & 0x0000FF000000FF00ull) << 8) | (A & 0x000000FF000000FFull);
A = ((A & 0xF0F0F0F0F0F0F0F0ull) << 4) | (A & 0x0F0F0F0F0F0F0F0Full);
A = ((A & 0xCCCCCCCCCCCCCCCCull) << 2) | (A & 0x0333333333333333ull);
A = ((A & 0xAAAAAAAAAAAAAAAAull) << 1) | (A & 0x5555555555555555ull);
B = ((b & 0x00000000FFFF0000ull) << 16) | (b & 0x000000000000FFFFull);
B = ((B & 0x0000FF000000FF00ull) << 8) | (B & 0x000000FF000000FFull);
B = ((B & 0xF0F0F0F0F0F0F0F0ull) << 4) | (B & 0x0F0F0F0F0F0F0F0Full);
B = ((B & 0xCCCCCCCCCCCCCCCCull) << 2) | (B & 0x0333333333333333ull);
B = ((B & 0xAAAAAAAAAAAAAAAAull) << 1) | (B & 0x5555555555555555ull);
return A | (B << 1);
}
void MergeTest(uint32_t a, uint32_t b) {
uint64_t C = Merge(a,b);
printf("a:%08" PRIX32 " b:%08" PRIX32 " c:%016" PRIX64 "\n", a,b,C);
}
void MergeTests(void) {
MergeTest(0x00000000L, 0xFFFFFFFFL);
MergeTest(0xFFFFFFFFL, 0x00000000L);
MergeTest(0x00000000L, 0x00000001L);;
MergeTest(0x00000000L, 0x00000010L);;
}
a:00000000 b:FFFFFFFF c:AAAAAAAAAAAAAAAA
a:FFFFFFFF b:00000000 c:5555555555555555
a:00000000 b:00000001 c:0000000000000002
a:00000000 b:00000010 c:0000000000000200
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With