Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently bitshifting bytes in 32/64 bit quantities?

For simplicity, assume I'm using a 32 bit little-endian processor and have declared the following 4-byte buffer:

unsigned char buffer[] = { 0xab, 0xcd, 0xef, 0x46 };

Let's say my goal is to bit-wise left shift each byte in the buffer by 4 bits. That is, I want to transform the buffer values to be: { 0xbc, 0xde, 0xf4, 0x60 }. To perform such a transformation, one might write code like:

for (int i = 0; i < 3; ++i)
{
  buffer[i] <<= 4; 
  buffer[i] |= (buffer[i + 1] >> 4);
}
buffer[3] <<= 4;

While this works, I'd much rather shift all 4 bytes simultaneously using my processor's native 32-bit registers:

unsigned char buffer[] = { 0xab, 0xcd, 0xef, 0x46 };
unsigned int *p = (unsigned int*)buffer; // unsigned int is 32 bit on my platform
*p <<= 4;

The above snippet successfully performs a shift, but not in the way I'm looking for. It appears that since I'm casting buffer to an unsigned int, the register is loaded (little-endian) with the value 0x46efcdab (instead of 0xabcdef46). Consequently, performing the 4-bit left shift results in 0xb0dafc6e instead of 0xbcdef460.

Aside from swapping bytes before the shift (e.g. htonl et al.) are there any tricks for efficiently shifting bytes in the manner I am seeking?

Thank you in advance for your insights.

like image 907
digitale Avatar asked Jan 06 '23 03:01

digitale


2 Answers

Use htonl/ntohl to flip between network (big-endian) byte order and native byte order:

uint32_t *p = (uint32_t*)buffer;
*p = htonl(ntohl(*p) << 4);

In effect, this loads the buffer contents as an integer in big-endian order, performs the shift, then writes it back in big-endian order.

This compiles into a couple of bswap instructions on x86, so it should be reasonably efficient (gcc -O3).


Here's some test code (buffer is global to avoid constant-folding, and the return prevents dead-code elimination):

#include <stdint.h>    // uint32_t
#include <arpa/inet.h> // ntohl, htonl

unsigned char buffer[] = { 0xab, 0xcd, 0xef, 0x46 };

int main() {
    uint32_t *p = (uint32_t*)buffer; // unsigned int is 32 bit on my platform
    *p = htonl(ntohl(*p) << 4);
    return *p;
}

This compiles into the following fairly simple machine code (x86-64; LLVM 7.0.2; cc -O2):

0000000000000000    pushq   %rbp           ; frame setup
0000000000000001    movq    %rsp, %rbp     ; frame setup
0000000000000004    movl    (%rip), %eax   ; load buffer
000000000000000a    bswapl  %eax           ; endian flip
000000000000000c    shll    $0x4, %eax     ; shift
000000000000000f    bswapl  %eax           ; endian flip
0000000000000011    movl    %eax, (%rip)   ; save buffer
0000000000000017    popq    %rbp           ; finish
0000000000000018    retq
like image 94
nneonneo Avatar answered Jan 23 '23 10:01

nneonneo


Just for comparison, you can do this without the use of htonl/ntohl. This assumes a little-endian CPU:

#include <stdint.h>

void lshift(unsigned char* buf) {
  uint32_t* p = (uint32_t*)buf;
  uint32_t lo = *p & 0x0F0F0F0F;
  uint32_t hi = *p & 0xF0F0F000;
  *p = (lo << 4) | (hi >> 12);
}

And the generated assembly with gcc -O3:

pushq   %rbp
movq    %rsp, %rbp
movl    (%rdi), %eax
movl    %eax, %ecx
shll    $4, %ecx
andl    $-252645136, %ecx       ## imm = 0xFFFFFFFFF0F0F0F0
shrl    $12, %eax
andl    $986895, %eax           ## imm = 0xF0F0F
orl     %ecx, %eax
movl    %eax, (%rdi)
popq    %rbp
retq

Depending on how many cycles bswapl is, it's likely the faster alternative.

like image 42
Chris Schmich Avatar answered Jan 23 '23 11:01

Chris Schmich