Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bitwise operations. Is this code safe and portable?

Tags:

c++

I need to compute the Hamming distance between bitsets that are represented as char arrays. This is a core operation, so it must be as fast as possible. I have something like this:

const int N = 32; // 32 always

// returns the number of bits that are ones in a char
int countOnes_uchar8(unsigned char v);

// pa and pb point to arrays of N items
int hamming(const unsigned char *pa, const unsigned char *pb)
{
  int ret = 0;
  for(int i = 0; i < N; ++i, ++pa, ++pb)
  {
    ret += countOnes_uchar8(*pa ^ *pb);
  }
  return ret;
}

After profiling, I noticed that operating on ints is faster, so I wrote:

const int N = 32; // 32 always

// returns the number of bits that are ones in a int of 32 bits
int countOnes_int32(unsigned int v);

// pa and pb point to arrays of N items
int hamming(const unsigned char *pa, const unsigned char *pb)
{
  const unsigned int *qa = reinterpret_cast<const unsigned int*>(pa);
  const unsigned int *qb = reinterpret_cast<const unsigned int*>(pb);

  int ret = 0;
  for(int i = 0; i < N / sizeof(unsigned int); ++i, ++qa, ++qb)
  {
    ret += countOnes_int32(*qa ^ *qb);
  }
  return ret;
}

Questions

1) Is that cast from unsigned char * to unsigned int * safe?

2) I work on a 32-bit machine, but I would like the code to work on a 64-bit machine. Does sizeof(unsigned int) returns 4 in both machines, or is it 8 on a 64-bit one?

3) If sizeof(unsigned int) returned 4 in a 64-bit machine, how would I be able to operate on a 64-bit type, with long long?

like image 840
ChronoTrigger Avatar asked Sep 06 '13 13:09

ChronoTrigger


2 Answers

Is that cast from unsigned char * to unsigned int * safe?

Formally, it gives undefined behaviour. Practically, it will work on just about any platform if the pointer is suitably aligned for unsigned int. On some platforms, it may fail, or perform poorly, if the alignment is wrong.

Does sizeof(unsigned int) returns 4 in both machines, or is it 8 on a 64-bit one?

It depends. Some platforms have 64-bit int, and some have 32-bit. It would probably make sense to use uint64_t regardless of platform; on a 32-bit platform, you'd effectively be unrolling the loop (processing two 32-bit values per iteration), which might give a modest improvement.

how would I be able to operate on a 64-bit type, with long long?

uint64_t, if you have a C++11 or C99 library. long long is at least 64 bits, but might not exist on a pre-2011 implementation.

like image 183
Mike Seymour Avatar answered Nov 04 '22 07:11

Mike Seymour


1) No, it is not safe/portable, it is undefined behavior. There are systems where char is larger than one byte and there is no guarantee that the char pointer is properly aligned.

2) sizeof(int) might in theory be anything on a 64 bit machine. In practice, it will be either 4 or 8.

3) long long is most likely 64 bits but there are no guarantees there either. If you want guarantees, use uint64_t. However, for your specific algorithm I don't see why the sizeof() the data chunk would matter.

Consider using the types in stdint.h instead, they are far more suitable for portable code. Instead of char, int or long long, use uint_fast8_t. This will let the compiler pick the fastest integer for you, in a portable manner.

As a sidenote, you should consider implementing "countOnes" as a lookup table, working on 4, 8 or 32 bit level, depending on what is most optimal for your system. This will increase program size but reduce execution time. Maybe try to implement some form of adaptive lookup table which depends on sizeof(uint_fast8_t).

like image 43
Lundin Avatar answered Nov 04 '22 09:11

Lundin