Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to split a word into two bytes

So what is the fastest way to split a word into two bytes ?

short s = 0x3210;
char c1 = s >> 8;
char c2 = s & 0x00ff;

versus

short s = 0x3210;
char c1 = s >> 8;
char c2 = (s << 8) >> 8;

Edit

How about

short s = 0x3210;
char* c = (char*)&s; // where c1 = c[0] and c2 = c[1]
like image 656
Jonas Avatar asked Dec 06 '22 06:12

Jonas


2 Answers

Let the compiler do this work for you. Use union, where the bytes will be split without any hand made bit-shifts. Look at the pseudo code:

union U {
  short s;  // or use int16_t to be more specific
  //   vs.
  struct Byte {
    char c1, c2;  // or use int8_t to be more specific
  }
  byte;
};

Usage is simple:

U u;
u.s = 0x3210;
std::cout << u.byte.c1 << " and " << u.byte.c2;

The concept is simple, afterwards you can overload the operators to make it more fancy if you want.

Important to note that depending on your compiler the order of c1 and c2 may differ, but that will be known before the compilation. You can set some conditinal macros to make sure that order is according to your needs in any compiler.

like image 198
iammilind Avatar answered Dec 22 '22 07:12

iammilind


I'm 99.9% sure the first one is at least as fast as the second in nearly all architectures. There may be some architectures where it makes no difference (they are equal), and in several architectures, the latter will be slower.

The main reason I'd say the second is slower is that there are two shifts to come up with the c2 number. The processor can't start to process the second shift until it has done the first shift.

Also, the compiler may well be able to do other clever stuff with the first one (if there are instructions to do that - for example an x86 processor can load s into AX, and store AL into c1 and AH into c2 - no extra instructions beyond the store operation), where the second one is much less likely to be a "known common pattern" (I certainly have never seen that variant being used in code, where the shift/and method is very commonly used - often in "pixel loops", meaning it's critical to implement good optimisation for it).

As always, measure, measure and measure again. And unless you are ONLY interested in your particular machines performance, try it on different models/manufacturers of processors, so you don't make something that is 5% faster on your model of machine, but 20% slower on another model.

like image 28
Mats Petersson Avatar answered Dec 22 '22 07:12

Mats Petersson