Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get bits from byte

Tags:

c++

c++11

bitset

I have the following function:

int GetGroup(unsigned bitResult, int iStartPos, int iNumOfBites)
{
 return (bitResult >> (iStartPos + 1- iNumOfBites)) & ~(~0 << iNumOfBites);
}

The function returns group of bits from a byte.
i.e if bitResult=102 (01100110)2, iStartPos=5, iNumOfBites=3
Output: 2 (10)2
For iStartPos=7, iNumOfBites=4
Output: 3 (0110)2

I'm looking for better way / "friendly" to do that, i.e with bitset or something like that.
Any suggestion?

like image 817
Evyatar Elmaliah Avatar asked Nov 05 '15 20:11

Evyatar Elmaliah


2 Answers

(src >> start) & ((1UL << len)-1)     // or 1ULL << if you need a 64-bit mask

is one way to express extraction of len bits, starting at start. (In this case, start is the LSB of the range you want. Your function requires the MSB as input.) This expression is from Wikipedia's article on the x86 BMI1 instruction set extensions.

Both ways of producing the mask look risky in case len is the full width of the type, though. (The corner-case of extracting all the bits). Shifts by the full width of the type can either produce zero or unchanged. (It actually invokes undefined behaviour, but this is in practice what happens if the compiler can't see that at compile time. x86 for example masks the shift count down to the 0-31 range (for 32bit shifts). With 32bit ints:

  • If 1 << 32 produces 1, then 1-1 = 0, so the result will be zero.

  • If ~0 << 32 produces ~0, rather than 0, the mask will be zero.

Remember that 1<<len is undefined behaviour for len too large: unlike writing it as 0x3ffffffffff or whatever, no automatic promotion to long long happens, so the type of the 1 matters.


I think from your examples you want the bits [iStartPos : iStartPos - iNumOfBites], where bits are numbered from zero.

The main thing I'd change in your function is the naming of the function and variables, and add a comment.

  • bitResult is the input to the function; don't use "result" in its name.
  • iStartPos ok, but a little verbose
  • iNumOfBites Computers have bits and bytes. If you're dealing with bites, you need a doctor (or a dentist).

Also, the return type should probably be unsigned.

// extract bits [msb : msb-len] from input into the low bits of the result
unsigned BitExtract(unsigned input, int msb, int len)
{
  return (input >> (msb-len + 1)) & ~(~0 << len);
}

If your start-position parameter was the lsb, rather than msb, the expression would be simpler, and the code would be smaller and faster (unless that just makes extra work for the caller). With LSB as a param, BitExtract is 7 instructions, vs. 9 if it's MSB (on x86-64, gcc 5.2).


There's also a machine instruction (introduced with Intel Haswell, and AMD Piledriver) that does this operation. You will get somewhat smaller and slightly faster code by using it. It also uses the LSB, len position convention, not MSB, so you get shorter code with LSB as an argument.

Intel CPUs only know the version that would require loading an immediate into a register first, so when the values are compile-time constants, it doesn't save much compared to simply shifting and masking. e.g. see this post about using it or pextr for RGB32 -> RGB16. And of course it doesn't matter whether the parameter is the MSB or LSB of the desired range, if start and len are both compile time constants.

Only AMD implements a version of bextr that can have the control mask as an immediate constant, but unfortunately it seems gcc 5.2 doesn't use the immediate version for code that uses the intrinsic (even with -march=bdver2 (i.e. bulldozer v2 aka piledriver). (It will generate bextr with an immediate argument on its own in some cases with -march=bdver2.)

I tested it out on godbolt to see what kind of code you'd get with or without bextr.

#include <immintrin.h>
// Intel ICC uses different intrinsics for bextr

// extract bits [msb : msb-len] from input into the low bits of the result
unsigned BitExtract(unsigned input, int msb, int len)
{
#ifdef __BMI__  // probably also need to check for __GNUC__
  return __builtin_ia32_bextr_u32(input, (len<<8) | (msb-len+1) );
#else
  return (input >> (msb-len + 1)) & ~(~0 << len);
#endif
}

It would take an extra instruction (a movzx) to implement a (msb-len+1)&0xff safety check to avoid the start byte from spilling into the length byte. I left it out because it's nonsense to ask for a starting bit outside the 0-31 range, let alone the 0-255 range. Since it won't crash, just return some other nonsense result, there's not much point.

Anyway, bext saves quite a few instructions (if BMI2 shlx / shrx isn't available either! -march=native on godbolt is Haswell, and thus includes BMI2 as well.)

But bextr on Intel CPUs decodes to 2 uops (http://agner.org/optimize/), so it's not very useful at all compared to shrx / and, except for saving some code size. pext is actually better for throughput (1 uop / 3c latency), even though it's a way more powerful instruction. It is worse for latency, though. And AMD CPUs run pext very slowly, but bextr as a single uop.

like image 80
Peter Cordes Avatar answered Oct 08 '22 19:10

Peter Cordes


I would probably do something like the following in order to provide additional protections around errors in arguments and to reduce the amount of shifting.

I am not sure if I understood the meaning of the arguments you are using so this may require a bit of tweaking.

And I am not sure if this is necessarily any more efficient since there are a number of decisions and range checks made in the interests of safety.

/*
 *  Arguments:    const unsigned bitResult byte containing the bit field to extract
 *                const int iStartPos  zero based offset from the least significant bit
 *                const int iNumOfBites  number of bits to the right of the starting position
 *
 *  Description: Extract a bitfield beginning at the specified position for the specified
 *               number of bits returning the extracted bit field right justified.
*/
int GetGroup(const unsigned bitResult, const int iStartPos, const int iNumOfBites)
{
    // masks to remove any leading bits that need to disappear.
    // we change starting position to be one based so the first element is unused.
    const static unsigned bitMasks[] = {0x01, 0x01, 0x03, 0x07, 0x0f, 0x1f, 0x3f, 0x7f, 0xff};

    int iStart = (iStartPos > 7) ? 8 : iStartPos + 1;
    int iNum = (iNumOfBites > 8) ? 8 : iNumOfBites;

    unsigned retVal = (bitResult & bitMasks[iStart]);
    if (iStart > iNum) {
        retVal >>= (iStart - iNum);
    }

    return retVal;
}
like image 43
Richard Chambers Avatar answered Oct 08 '22 20:10

Richard Chambers